Design
We designed a novel and safe data storage and transmission system by combining digital and biological safety precautions. The system consists of the data which is to be transmitted or preserved, further called: the message, and a key to the message. The key functions as a password to retrieve the message. Whilst the message is encrypted and thereby protected by the digital key, the key itself also needs protection. Therefore we developed a system of different biological safety layers. Our CryptoGErM system is an unhackable bioencryption system. Even if cyber criminals would intercept the system, they would not be able to get their hands on the message because they lack the key. To access the key not only general knowledge about molecular biology is required. The recipient has to know specific information about the exact biological procedure that has to be applied in order to access the key.
Applications:
CryptoGErM has three main possibilities of application. Short messages can be encrypted in the genome of Bacillus subtilis as described in this chapter. The genome of B. subtilis consists of approximately 4 million base pairs. Two bits are converted into one base pair. Thus, 1 MB could be stored in the genome of one bacterium.
For sending of larger datasets it would be possible to send the encrypted data via conventional methods like email and only send the key in spores.
Furthermore it is an option to divide the dataset over the genome of several bacteria. This would also provide a way of long term storage of large datasets.
Encryption and Conversion
The message that is to be sent and stored is encrypted using the Rijndael algorithm. This means the message is encoded so only an authorized person can read it. The plain text is being converted to cipher text. In this state the message cannot be read. The algorithm requires a key for the encryption process. This key can be a word just like a password. In theory it is possible to decrypt the message without the key but a lot of computing power, knowledge and resources are required so practical it is almost impossible. With the key the message can be easily decrypted. The key itself is, of course, not encrypted.
At this point of our design we have a key and message. The message is encrypted and reads out as nonsense. At first key and message consist out of plain text, numbers and symbols. Our encryption and decryption machine then converts those into the corresponding ASCII code. The number sequence of the ASCII code is then converted into binary. And the binary code is converted into a sequence of base pairs. Thus 00 stands for adenine, 01 for cytosine, 10 for thymidine, and 11 for guanine. The machine also adds header and footer sequences to the message and key sequences for the recipient to be able to recover key and message from the full genome. Figure 1 shows a conversion table of text to the ASCII code, to binary and to DNA.
The work with BioBricks requires some reserved sequences like the restriction enzyme sequences for EcoRI, PstI, SpeI, XbaI and ideally also NotI. Additionally our work with the B. subtilis shuttle vector pDR111 required the reserved sequences SalI, HindIII and BglII. Our encryption and decryption machine can consider those reserved sequences while converting message and key. It is also individually adjustable to other required reservations.
Letter | H | e | l | l | o | w | o | r | l | d | |
---|---|---|---|---|---|---|---|---|---|---|---|
ASCII | 072 | 101 | 108 | 108 | 111 | 032 | 119 | 111 | 114 | 108 | 100 |
Binary | 0100 1000 |
0110 0101 |
0110 1100 |
0110 1100 |
0110 1111 |
0010 0000 |
0111 0111 |
0110 1111 |
0111 0010 |
0110 1100 |
0110 0100 |
DNA | ATAC | CCTC | AGTC | AGTC | GGTC | AATA | GCGC | GGTC | TAGC | AGTC | ACTC |
Integration:
Our original message and our key have been converted into DNA sequences. Synthesizing DNA is possible at constantly decreasing costs [1].
Data in DNA format has many advantages. The right choice of organism can deliver even more advantages. We chose Bacillus subtilis. It is not pathogenic for humans, it is naturally competent and a well-researched model-organism. On top of all that it can form highly resistant endospores. Viable spores of B. subtilis could be recovered from a 250 million year old salt crystal [2]. What else could we wish for as a highly safe storage and transmission system? The sequences of key and message are being integrated into the genome of two different B. subtilis strains by making use of the BBa_K823023 B. subtilis integration vector which integrates into the amyE locus. A scheme of the integration process is shown in Figure 3 and the sequences integrated in the plasmid is visible in figure 3. Like this they can be shipped or stored separately what makes interception of our system for unauthorized parties even harder. Genomic integration of both message and key is safer than plasmid integration because the relevant sequences are hidden in a full genome. Whole genome sequencing is more elaborated than sequencing of plasmid, especially if unauthorized parties don’t know what they are looking for.
Transmission
Spores are highly stable at room temperature and they don’t require and nutrients. The can stay in a test tube for weeks. This grants enough time to send them to any spot on Earth. And who would suspect highly secret data in a tiny amount of colorless liquid in a reaction tube? A scheme of sporulation and shipping can seen in Figure 4.
Treatment
After the different spores, containing key and message, have arrived at the recipient the key has to be recovered. Only with the key it is possible to decode the message in the genome of the message-spores. As visible in Figure 5, the spores have to recieve the right treatment during germination, otherwise the key sequence cannot be recovered.
1. Decoy key hiding
To make it especially hard for unauthorized parties to access our key, the key-spores will be sent in a mixture of decoy spores. The decoy is present in a much higher ratio than the key-spores. The key-spores have to be recovered using a specific selection mechanism. We have developed a ciprofloxacin resistance cassette that can be integrated into the B. subtilis genome. Only the knowledge of this specific selection antibiotic will allow the recovery. Even better: This resistance cassette also provides resistance against spirofloxacin which is a conjugate of the regular antibiotic ciprofloxacin and a spiropyran photoswitch. In case of using the photoswitchable antibiotic, spirofloxacin (in its inactivated state) could be already added to the spore mixture. The recipient simply has to activate it by shining light of the right wavelength on it.
To design a qnrS1 resistance cassette BioBrick we designed a gBlock that contains the Bacillus subtilis promoter PatpI, which is active from a very early stage of germination and includes a ribosome binding site. The gBlock also contains the original qnrS1 gene sequence from E. coli, the double terminator BBa_B0015 from iGEM as well as the prefix and suffix for BioBricks. The qnrS1 resistance cassette was integrated into the amyE locus of the B. subtilis genome using the BBa_K823023 B. subtilis integration vector. The qnrS1 plasmid and the selection process of the key-containing spores from the decoy can be seen in Figure 6.
For testing the efficiency of the selection of the correct key spores from the decoy a superfolder GFP was integrated into the genome of the key spores. This can be seen in figure 7. Via microscopy and flow cytometry the selection efficiency could be observed. The superfolder GFP was integrated into the B. subtilis genome by using the pDR111 B. subtilis shuttle vector.
2. NucA key deletion
Even more safety is provided by two key deletion approaches. In this case, if the recipient doesn’t apply the correct treatment, the key sequence will be destroyed.
For the design of a nucA key deletion we made use of the following BioBricks: BBa_R0040_tetR, BBa_K729004_nuclease, BBa_B0030_RBS and BBa_B0015_terminator. During the cloning process we expected to obtain the following BioBricks: RBS-nuclease, tetR-RBS-nuclease and tetR-RBS-nuclease-terminator. The key deletion is controlled by the constitutively on promoter PtetR. RNA polymerase binds to the DNA sequence and leads to the expression of the nuclease. If tetracycline is added, it functions as a repressor, binds to the promoter and thus leads to the inhibition of the transcription of the nuclease. If an unauthorized party tries to grow the key-containing spores in regular growth medium without the addition of tetracycline, the key deletion will be active and the DNA will be digested by the nuclease. This process can be seen in figure 8.
3. CRISPR key deletion
Another key deletion approach makes use of the CRISPR/Cas9 system. The system can be used as a genome editing tool. The plasmid pJOE8999 is a shuttle vector with a pUC origin of replication for E. coli, a kanamycin resistance gene and a temperature-sensitive replication origin of plasmid pE194ts. This plasmid carries the Cas9 gene under the transcriptional regulation of a mannose-inducible promoter PmanP, and a single guide RNA under a strong constitutive promoter. For application in our project, the idea was to replace the mannose-inducible promoter with a tetracycline-repressible promoter PtetR. The pTetR promoter is constitutively on and is repressed by TetR. The repression can be inhibited by the addition of tetracycline. If the key-spores containing the CRISPR/Cas9 system are being revived by unauthorized parties who are unaware of the correct treatment procedure, the targeted area, in this case the key-sequence will be deleted as visible in Figure 9. This way the message will be safe and secure.
Decoding
Now that the receiver has received the message and was able to successfully retrieve the correct key-spores, he can sequence both of them. Key and message sequence have to be put into the encryption and decryption machine. The machine is able to detect the correct part of the full sequence by a header (GACCAAGCCTGC) and a footer (GCACCCACCGAC) that have been added to the sequence will converting it into a DNA string. The machine will apply the key to the message and convert it back from DNA into the data set that was chosen by the sender.
What if?
Making use of artificial intelligence and computational modelling we have designed different scenarios taking into account a possible intruder of our system and also about how the mutation rate of DNA influences our system.
We have also looked at the future perspectives of our project. With the great flexibility of the BioBrick system, other methods for securing the data inside DNA can be added or developed. Finally, we have thought of ways to access the data in the DNA faster.
References
- [1] NIH National Human Genome Research Institute. [Online. Accessed October 19, 2016].
- [2] Vreeland, R.H., Rosenzweig, W.D. & Powers, D.W., 2000. Isolation of a 250 million-year-old halotolerant bacterium from a primary salt crystal. Nature, 407(6806), pp.897–900. [Online. Accessed September 26, 2016].