Our project has an advantage from a safety perspective in that it is largely cell-free. The process of BabbleBrick assembly only involves the magnetic beads, enzymes and, of course, the Bricks themselves. Our constructs can then be inserted into plasmids and stored in test tubes, and will actually be more stable outside of cells rather than inside them due to spontaneous mutations and replication errors that occur within cells (Allentoft et al., 2012; Lee et al., 2012).
Even if one were to insert a BabbleBlock into a cell to amplify, we took precautions so that it will not affect the cell’s function. To that end, we inserted stop codons in all three reading frames of our BabbleBricks, and in both directions, so that if our artificially constructed DNA were to somehow be transcribed, this will not result in nonsensical peptide chains that might interfere with normal cellular processes. Nonsense polypeptides are not well-tolerated in cells and so there are innate cellular mechanisms that eliminate mRNA with premature stop codons in it (Brogna and Wen, 2009).
We employed a double-layered method to safeguard the integrity of our DNA-stored data. Within each of our BabbleBricks we encoded a checksum, which will be able to detect any “mistake” (mutation/sequencing error) in a constructed sentence. We also included an optimal rectangular code (ORC), which will be able to detect specific errors in the BabbleBricks, and rectify them with very high fidelity (100% for one mistake, 90% for two mistakes, and 80% for 3 mistakes), restoring their meaning.
We additionally gave consideration to data security. We incorporated the possibility for encryption using a stream cipher, with a different key used for each BabbleBlock (sentence or segment of data). The keys themselves are generated by a random generating function using a chosen key (“seed”) which we encrypt using RSA.
In our lab we used all standard safety procedures, including training about emergency exits, safe biological and lab waste disposal, and only doing lab work in the presence of our lab supervisor. We also performed inoculations and other aseptic procedures under a fume hood, and all processes were performed wearing appropriate lab equipment.
Allentoft, M., Collins, M., Harker, D., Haile, J., Oskam, C., Hale, M., Campos, P., Samaniego, J., Gilbert, M., Willerslev, E., Zhang, G., Scofield, R., Holdaway, R. and Bunce, M. (2012). The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proceedings of the Royal Society B: Biological Sciences, 279(1748), pp.4724-4733.
Brogna, S. and Wen, J. (2009). Nonsense-mediated mRNA decay (NMD) mechanisms. Nat Struct Mol Biol, 16(2), pp.107-113.
Lee, H., Popodi, E., Tang, H. and Foster, P. (2012). Rate and molecular spectrum of spontaneous mutations in the bacterium Escherichia coli as determined by whole-genome sequencing. Proceedings of the National Academy of Sciences, 109(41), pp.E2774-E2783.
We spent significant time establishing what type of data would be most useful to store using our method. Thanks to the arbitrary fashion in which a DNA sequence can be matched to a word or data point, theoretically any type of information can be encoded. However, it wouldn’t be practical for all of them. After consulting with several potential stakeholders, such as the National Library of Scotland (who else?), as well as professors from our university, (who are the astronomy people?), we concluded that the primary best use is archival data storage. This is due both to the longevity and stability of the DNA molecule, but also to the restrictions time that DNA sequencing and data retrieval impose.
Another (niche, but still applicable) idea that we came to was that of observatories, which are in obscure and badly connected areas due to the need for no light pollution from cities. These collect massive quantities of data that they cannot send over the internet, and so they end up discarding 90% of it before processing the remaining 10%. If the data collected could be cheaply and compactly stored in DNA, that would enable the transportation and processing of ten times the data.