Proof
Having developed our DNA information storage system, it is important to validate our software performs its intended function. We did thorough dry-lab testing and wet-lab validation, in which we tested the efficiency and safety of our system and successfully restored our file in synthesized DNA sequences.
Dry-lab testing
Bio101 software testingTo examine the usability and stability of the software Bio101, we design the software testing model. We test the software in the aspects of errors (deletion, insert and substitute) and the distribution of the errors (discrete or successive). We control the conditions artificially so that we can compare the different situations obviously.
Fault toleranceThe distribution of errors
We choose n files including large files (>1M) and small files (<1M), distributing them on 1:1.
We recognize under the equal errors, successive errors which is more than 5% of the sequences length is successive type. The other is discrete type. Especially when we test one type of the distributions, we consider all kinds of errors. For example, when we test the discrete type, we have discrete deletion, discrete insert and discrete substitution equally, then calculate the successful encoding rate. The test of successive distribution is the same.
Here we list the result.
Tab.1.Discrete distribution.
Tab.2.Successive distribution.
* √: successful ×: failed
The conclusion is that the fault tolerance of discrete distribution is better than successive distribution’s.
Fig.1.The result comparison.
In a similar approach, we deal with the files in the same way to test errors one by one. We choose n files including large files (>1M) and small files (<1M), distributing them on 1:1. The following is the result.
Tab.3.Deletion
Tab.4.Insert
Tab.5.Substitution
* √: successful ×: failed
Fig.2.The result of three tests.
In consideration of biological safety, we should produce sequences with sufficient random distributed A, T, C, G. We recognize the successive number of the same bases as the standard to test randomness.
The percentage of normal length of successive bases & the length of the sequence the longest successive bases & the sequence lengthAs all these we test Bio101, we have the conclusion that our software has great usability and stability. There may be some unexpected situations happening when users encode the files, you can contact us to solve the problem through our iGEM wiki. We desire to improve Bio101 with users and we are looking forward your feedback!
*Our wiki: https://2016.igem.org/Team:UESTC-software
Wet lab validation
Besides software testing, wet lab validation is also needed.
Fig.3.Wet lab validation flow chart.
We transformed the chosen file, “sSBOLv.svg”, to DNA sequence file. Going through thorough data analysis and safety confirmation, we connected a biotech company to help us synthesize the DNA sequences.
DNA sequences carrying our file information should be stored in host cells. We chose E.coli TOP10 to store plasmids. The synthesized DNA sequences were transformed into pUC47.
Fig.4.Plasmids transformation.
After a week of storage, we took out the sample for sequencing. In order to improve the accuracy of sequencing, we used PCR amplification and high-throughput sequencing to accomplish our work. With regard to the sample, we used PCR amplification to generate more sequences at first. Then used E. coli to copy the sequences for high-throughput sequencing.
In the end, we uploaded the DNA sequences file to our software and decoded them. At last, we achieved the original file perfectly.