Skip to main content

A Little Optimization

·6 mins

The other day I had to plan some capillary sequencing experiments for our lab. Each sequencing experiment takes a DNA sample and set of primers as input. The primers and DNA are stored on 96-well plates, which are loaded onto a Tecan liquid-handling robot for picking. We have to generate 707 sequencing reactions from 436 samples, with the DNA coming from 13 plates and the primers from 47 different plates.

In total, we have 47+13+8=68 plates in play (the 8 additional plates are the destinations for the primer/DNA mix for sequencing). The challenge is to split up the work into as few experiments as possible, with the constraint that the Tecan robot can handle at most 27 plates at a time. With this limitation, we see that we need at least 3 experiments. But is there a combination of input plates that will enable us to complete the work in 3 batches? This problem is small enough that we can use a brute-force approach that evaluates every combination of input DNA plates.

I started by summarising the data as follows:

DNA Plate Number of Reactions Primer Plates
D01 30 P01,P04,P05,P06,P07,P11,P12,P13,P14,P15,P24,P33
D02 43 P02,P40,P42
D03 80 P03,P04,P05,P06,P07,P08,P11,P13,P14,P15,P26
D04 21 P04,P05,P06,P07,P09,P10,P13,P20
D05 79 P16,P21,P22,P27
D06 83 P17,P18,P25,P29
D07 7 P19,P25
D08 53 P23,P28,P30
D09 107 P04,P05,P06,P07,P13,P15,P26,P31,P32,P33
D10 96 P34,P35,P36,P37
D11 21 P38,P39,P41,P43
D12 9 P02,P40,P42
D13 78 P44,P45,P46,P47

The idea is to select a combination of 5 DNA plates for the first experiment, 4 for the second, and 4 for the third so that each experiment creates no more than 3 PCR plates (3x96 - 2 = 286 sequencing reactions) and requires no more than 19 input primer plates (to ensure there are no more than 27 plates on the robot).

We'll use the clojure.data.csv library to parse the above data from a comma-delimited file, and the combinations function from clojure.math.combinatorics to generate combinations from our list of input plates, so add the following dependencies to your project.clj:

[org.clojure/math.combinatorics "0.0.4"]
[org.clojure/data.csv "0.1.2"]

With that in hand, we can read the data into a map keyed on DNA plate name. Note the use of reduce to accumulate the data into a map. We also provide a couple of helper functions to accumulate data from the map for a list of plates:

(require '[clojure.java.io :as io])
(require '[clojure.data.csv :as csv])

(defn parse-csv
  "Parse the CSV from `rdr` into a map keyed on DNA plate name, whose
  values are a map keyend on :num-variants and :primer-plates."
  [rdr]
  (reduce (fn [accum datum]
            (let [[dna-plate num-variants & primer-plates] datum
                  num-variants (Integer/parseInt num-variants)]
              (assoc accum dna-plate {:num-variants num-variants :primer-plates primer-plates})))
          {}
          (csv/read-csv rdr)))

(def data
  (with-open [rdr (io/reader "http://www.1729.org.uk/assets/data/capseq.csv")]
    (parse-csv rdr)))

(defn num-variants-for
  "Return the number of variants for the specified DNA plate(s)."
  [dna-plates]
  (reduce + 0 (map #(get-in data [% :num-variants]) dna-plates)))

(defn primer-plates-for
  "Return the primer plates for the specified DNA plate(s)."
  [dna-plates]
  (reduce into #{} (map #(get-in data [% :primer-plates]) dna-plates)))

(num-variants-for ["D01" "D02"])
;;=> 73

(primer-plates-for ["D01" "D02"])
;;=> #{"P40" "P42" "P11" "P33" "P01" "P12" "P02" "P13" "P24" "P14" "P04" "P15" "P05" "P06" "P07"}

Now to business. We have to split 13 DNA plates between 3 experiments, which means we need to partition the plates into sets of 5+4+4. We build a set of DNA plate names, and use the clojure.math.combinatorics/combinations function to generate all distinct selections of 5 plates for the first experiment. We use this function again to generate distinct selections of 4 plates from the remainder for the second experiment, leaving us with 4 plates for the final experiment.

(require '[clojure.set :as set])
(require '[clojure.math.combinatorics :refer [combinations]])

(def dna-plates (set (keys data)))

(defn generate-selections
  "Return all partitions of the set `s` into a partition of `n1`
  elements, a partition of `n2` elements, and the remainder."
  [s n1 n2]
  (for [exp1-dna-plates (map set (combinations s n1))
        exp2-dna-plates (map set (combinations (set/difference s exp1-dna-plates) n2))
        :let [exp3-dna-plates (set/difference s exp1-dna-plates exp2-dna-plates)]]
    [exp1-dna-plates exp2-dna-plates exp3-dna-plates]))

(take 5 (generate-selections dna-plates 5 4))
;; => ([#{"D10" "D11" "D01" "D12" "D02"} #{"D13" "D03" "D04" "D05"} #{"D06" "D07" "D08" "D09"}]
;;     [#{"D10" "D11" "D01" "D12" "D02"} #{"D13" "D03" "D04" "D06"} #{"D05" "D07" "D08" "D09"}]
;;     [#{"D10" "D11" "D01" "D12" "D02"} #{"D13" "D03" "D04" "D07"} #{"D05" "D06" "D08" "D09"}]
;;     [#{"D10" "D11" "D01" "D12" "D02"} #{"D13" "D03" "D04" "D08"} #{"D05" "D06" "D07" "D09"}]
;;     [#{"D10" "D11" "D01" "D12" "D02"} #{"D13" "D03" "D04" "D09"} #{"D05" "D06" "D07" "D08"}])

Not all of these combinations meet the constraint of at most 27 plates in total. We compute the number of slots required for each combination, and filter the valid partitions. Each PCR plate has a capacity of 96, and we need 2 wells per experiment for controls; another helper function computes the number of PCR plates required to accommodate the sequencing reactions.

(defn num-pcr-plates
  "Helper function to compute the number of PCR plates required to
  accommodate `num-variants`. Each plate has a capacity of 96, and we
  need two empty wells per experiment for controls."
  [num-variants]
  (int (Math/ceil (/ (+ num-variants 2) 96))))

(defn tecan-slots
  "Return the number of Tecan slots required to process the specified `dna-plates`."
  [dna-plates]
  (+ (count dna-plates)
     (count (primer-plates-for dna-plates))
     (num-pcr-plates (num-variants-for dna-plates))))

(def max-tecan-slots 27)

(defn is-valid-combination?
  "The partition of plates into set is valid if it requires no more
  than the available Tecan slots."
  [dna-plate-sets]
  (every? #(<= (tecan-slots %) max-tecan-slots) dna-plate-sets))

(count (filter is-valid-combination? (generate-selections dna-plates 5 4)))
;; => 930

Finally for the optimization step. The best layout is the one that wastes the least space on the destination PCR plates. We define a penalty function that computes the number of empty wells on the generated PCR plates, and select a partition of DNA plates that minimises this penalty (there may be more than one, but any of the optimal partitions will do).

(defn num-empty-wells
  "Calculate the number of empty PCR wells for a given set of DNA plates."
  [dna-plates]
  (let [num-variants (num-variants-for dna-plates)
        plates-required (num-pcr-plates num-variants)
        available-wells (* plates-required wells-per-plate)]
    (- available-wells num-variants)))

(defn total-empty-wells
  "Calculate the total number of empty PCR wells for a given partitioning of DNA plates."
  [dna-plate-sets]
  (reduce + (map num-empty-wells dna-plate-sets)))

;; The optimal partition is one (not necessarily unique) that wastes
;; the fewest PCR wells.
(def optimal-partition
  (apply min-key total-empty-wells (filter is-valid-combination? (generate-selections dna-plates 5 4))))
;; => [#{"D12" "D04" "D06" "D07" "D08"} #{"D01" "D02" "D03" "D09"} #{"D10" "D11" "D13" "D05"}]

;; How many Tecan slots will be used by each experiment?
(map tecan-slots optimal-partition)
;; => (26 27 23)

;; How many PCR plates will be generated by each experiment?
(map (comp num-pcr-plates num-variants-for) optimal-partition)
;; => (2 3 3)

;; How well did we do at filling up the PCR plates?
(map num-empty-wells optimal-partition)
;; => (19 28 14)

We didn't know in advance that the partition 13=5+4+4 would give us an optimal solution, but we see that we haven't done too badly. We couldn't have fitted all of the required plates into fewer than 3 runs of the Tecan robot, and we couldn't have gotten by with fewer than 8 output PCR plates (and 61 empty wells). We can now go ahead with the above partition of DNA plates and generate three Tecan programs for the lab.

The complete code for this example is available on Github: https://github.com/ray1729/CapSeqPlanning.