Financial and legal records, military and government documents, these are just a few examples of important information that must be preserved for a long time, but could cause great damage in the wrong hands. The system we constructed will be useful for the kind of sensitive information that should be stored or transferred in a very secure manner, but does not have to be accessed quickly (within seconds).
Our data storage device has four main applications:
- Secure storage of sensitive data (can’t be hacked digitally while stored)
- Secure transfer of sensitive data (can’t be hacked digitally during transfer)
- Archival data storage (high storage density, long lifespan)
- Time capsule to preserve information about our civilization (dense, durable and DNA-based)
What is sensitive data?
Data can be classified as sensitive if any of the following conditions are true:
- Unauthorized disclosure may have serious adverse effects on the reputation, resources, or services of a business, institution or individual.
- The data is protected under federal or state regulations.
- There are considerations relating to ethics, privacy or intellectual property rights.
Some examples of sensitive data that could be stored and transferred using our system include:
- Patent and prototype information
- Patented or trademarked information such as software source code developed at a company or university, or a movie script written by a movie production company
- Genealogical records
Medical records, or information that relates to an individual’s:
- Past, present, or future physical or mental condition
- Provision of health care (hospital visit, appointments with a doctor, therapy)
- Payment for the provision of health care
IT security information including configurations, reports and log data:
- IT security program plans
- Incident information logs
- Access and authentication logs
- Firewall settings
- Login data
- Top secret government and military documents
- Information relating to public safety and national security
- Some types of information about hazardous (radioactive, toxic, explosive, contagious) substances
- Blueprints and building plans of government buildings, banks, military facilities
- Financial records
- Information related to investments and investment planning by companies or individuals
Banking account details relating to credit and debit cards:
- Cardholder name, account number, expiration date, PIN number and verification number
- Information related to insurance claims
Information about legal proceedings
- Court records
Confidential communications between an attorney and a client relating to:
- A lawsuit or misconduct proceeding
- A contract dispute
Scientific research that is regulated for reasons of national security, foreign policy or anti-terrorism. Examples of sensitive scientific research topics include:
- Radioactive, explosive, chemical and biological agents
- Certain cryptography software
- Military electronics
- Satellite information
Although our system is highly versatile, it has two notable drawbacks when it comes to transfer of confidential data. Synthesis of DNA carrying the encrypted data is expensive and time-consuming, and is currently the biggest bottleneck in our process (although this is bound to change with advances in DNA synthesis technology). Furthermore, people do not want to share their secret data with a third party that synthesizes the DNA. Both of these issues can easily be overcome by having the third party only synthesize the key, which is far shorter than the encrypted information. This way, the user sends the secret data via conventional means, and the key inside spores. This method of secure data transfer is faster, cheaper and simpler than encoding all of the data in DNA, but does not benefit from DNA’s incredibly long shelf-life.
Data archiving refers to the transfer of data that is not actively used onto a separate storage medium for long-term storage. Such data may be important for future reference (old email messages, scientific data, medical records), or needs to be preserved to comply with regulations (records of legal proceedings, phone records). Data archiving is of vital importance to many organizations, such as IT companies and legal firms, as improper archiving can result in fines and legal sanctions, not to mention expensive and time-consuming data discovery searches. Furthermore, scientific data generated by institutions like CERN must be preserved for years in order to verify future discoveries.
Although data archiving is often mistaken for making backup copies of data, there is a notable difference between the two. Data backups are copies of data which allow the recovery of information that has been destroyed or corrupted. Data archives, on the other hand, preserve older data that does not need to be accessed regularly, in order to reduce the use of the primary form of data storage.
There are several forms of data archives, each with their own benefits and disadvantages. Cloud-based data archiving is becoming more popular as online data transfer speeds increase, while storage media costs drop. Although cloud storage is initially inexpensive, it requires ongoing maintenance and consumes far more energy in the long term than offline data storage. Offline archiving involves writing the data onto removable media such as magnetic tape. Tape-based archives do not consume power during storage, and are therefore a cheaper alternative to cloud storage.
Archives contain records that have been generated as a result of administrative, commercial, legal or social activities. There are several types of archives, based on the type of data being preserved, including:
- Government archives
- Corporate archives
- College and university archives
- Historical archives
- Religious archives
DNA is an incredibly dense and durable storage medium. When encased in bacterial spores its durability becomes even greater. For these reasons, in addition to safe storage and transfer of data, our device is perfectly suited for data archiving.
Collins English Dictionary defines a time capsule as “a container holding articles, documents, etc, representative of the current age, … [meant] for discovery in the future”. Essentially, it is a historic collection of information and items placed in a durable container and buried underground or interred in the foundation of a new building.
Time capsules are a way to communicate with civilization of the future, particularly with future historians and anthropologists . For this reason, time capsules are typically assembled with the intention that they will someday be opened. The Crypt of Civilization, constructed a Oglethorpe University in 1936 is considered to be the first modern time capsule, and is scheduled to be opened in the year 8113 . 6,000 years is quite a while, but the Crypt has been designed to withstand the test of time. However, the same cannot be said for many other time capsules. Several preservation issues must be addressed when constructing a time capsule. Many time capsules are destroyed by groundwater, or simply lost because their location is forgotten. The choice of media used to preserve time capsule information is crucial, if it is meant to be retrieved in the future. Technology advances at a rapid pace, quickly becoming obsolete, while electronic and magnetic storage media suffer from deterioration after just a few decades, illustrating some of the preservation issues that must be taken into account. For instance, while disk drives capable of reading 5 1⁄4 inch floppy disks were commonly used before the turn of the 21st century, you would be hard pressed to find one today. In this respect, DNA presents and excellent medium for storage of information meant for the distant future; as long as there is intelligent DNA-based life, there will be a reason to study DNA. As such, technologies for manipulating DNA can be expected to exist in the future.
A time capsule holding information encoded in DNA can serve as more than just a message to the future. Given it’s density, durability and low energy requirements, DNA-based time capsules can be used to preserve human knowledge for future civilizations or extraterrestrial visitors, in case of an Extinction Level Event that wipes out our society. The KEO space time capsule, set to launch some time in 2017, is meant to carry a compendium of current human knowledge, and is designed to re-enter Earth’s atmosphere in 50,000 years. Although 50 millennia seems like a long time to us, it is a blink of an eye from cosmological and evolutionary perspectives. Information preserved in DNA can potentially last for millions of years , and therefore may someday be discovered by a completely new humanoid species. The Voyager Golden Records, launched aboard the Voyager space probes in 1977 are another example of time capsules in space. These phonograph records contain 115 images and almost 1.5 hours of sounds chosen to portray life on Earth. These records will never be seen by human eyes again, as they are over 2.02×1010 km away, and heading away from Earth at a speed of about 62,100 km per hour. Instead, they are meant for an extraterrestrial audience. Although expected to last for millions of years, these records represent a proverbial drop in an ocean of human knowledge. A DNA-based archive included in such a time capsule could represent a much more detailed snapshot of our civilization.
“This is a present from a small, distant world, a token of our sounds, our science, our images, our music, our thoughts and our feelings. We are attempting to survive our time so we may live into yours.” - U.S. President Jimmy Carter
“Launching of this 'bottle' into the cosmic 'ocean' says something very hopeful about life on this planet." - Carl Sagan
Time and cost estimation
CryptoGErM is not a system that will replace a WhatsApp message or an email. It is meant for highly secure data which does not have to be accessed within a few seconds. We assume the message will be synthesized by a company. The company IDT, which also supported us during our project, claims the synthesis of a gBlock including shipping takes 5-8 business days depending on the size of the fragment. Costs for gBlock synthesis are starting from ~ 0.17$ per base pair . Our example message consists of 572 bps and would thus cost ~ 100$. The synthesized gene will include flanking regions for integration into Bacillus subtilis. The gBlock will be amplified to a high concentration via PCR, which takes a few hours. And can directly be transformed into B. subtilis. The transformation takes one day. After transformation, sporulation will be initiated, this takes about 10 hours . From ordering of the gBlocks until the message or key containing spores are obtained a time span of maximum 10 days is needed. Many sequencing services offer an especially fast deal which would speed up the process by a few days. The shipping of the GMO samples can be carried out within a day. The receiver of the message will allow the spores to germinate. After a few hours a sufficient DNA concentration will be obtained. The treatment to recover the key will take place during the germination and is a matter of minutes. New high-throughput sequencing methods have an efficiency of ~ 1 Gb/h . The Bacillus subtilis genome consists of approximately 4 million base pairs. Sequencing will thus take less than an hour. Entering the sequence into the website and decoding is only a matter of seconds. Decrypting the message after receiving it will thus take about 1 day. From designing and encryption the message by the sender, to decryption by the recipient, a maximum timespan of 12 days will pass. We assume that all steps, except the synthesis, can be carried out in the laboratories of sender and receiver. With improving techniques this will get increasingly faster.
-  Eternal Memory: Long-Duration Storage Concepts for Space 66th International Astronautical Congress, Jerusalem, Israel. ©2015 by M Guzman, A Hein and C Welch.
-  Crypt of Civilization Paul Stephen Hudson, Georgia State University Perimeter College, 04/01/2003
-  Isolation of a 250 million-year-old halotolerant bacterium from a primary salt crystal Nature 407, 897-900 (19 October 2000) | doi:10.1038/35038060
-  “gBlocks Gene Fragments.” [Online]. Available: http://www.idtdna.com/pages/products/genes/gblocks-gene-fragments.
-  D. Schultz, P. G. Wolynes, E. Ben Jacob, and J. N. Onuchic, “Deciding fate in adverse times: sporulation and competence in Bacillus subtilis.,” Proc. Natl. Acad. Sci. U. S. A., vol. 106, no. 50, pp. 21027–34, Dec. 2009.
-  C. Bertelli and G. Greub, “Rapid bacterial genome sequencing: methods and applications in clinical microbiology,” Clin. Microbiol. Infect., vol. 19, no. 9, pp. 803–813, 2013.