Bio101 has been tested internally and externally by us with some help from our collaborators.
To examine the usability and stability of the software Bio101, we designed the software testing model. We tested different kinds of errors (deletion, insert and substitute) and error distribution has also been concerned (discrete or successive).Fault tolerance
We choose n files and have large files (>1M) and small files (<1M) distributing on 1:1. And we recognize a successive error, which is more than 5% of the sequences length, is successive type. The other is discrete type. Especially when we test one type of the distributions we consider all kinds of errors, [i.e.] deletion, insertion and replacement. For example, when we test the discrete type, we have discrete deletion, discrete insert and discrete substitution equally, and then calculate the success encoding rate. The test of successive distribution is the same.
Here we list the result:
* √: successful ×: failed
Fig.1.The percentage of different distributions of error.
Fig.2.The percentage of different types of error
The conclusion is that the fault tolerance of discrete distribution is better than that of successive distribution. And the success encoding rate of insert errors is higher than the other two kinds.Randomness
We should produce sequences with sufficient random distributed A, T, C and G. And we recognize the successive number of the same bases as the standard to test the randomness. The distributions of A, G, C, T and higher-order combinations of the nucleotides are similar and homopolymers are rare. In consideration of biological safety, we also use massive data to test limit of homopolymers and the most of the longest competitive bases. Here are parts of our testing results.
Fig.3.The distribution of three repeated bases
Fig.4.The distribution of four repeated bases
Fig.5.The distribution of the longest repeated bases
To confirm our workflow’s feasibility, we went through the whole process of DNA information storage.
We used our software to encode a file and the DNA sequences were synthesized by a specialized company (General Biosystems Company, Anhui, China). After a week’s storage, we took out the sample for sequencing. In order to improve the accuracy of sequencing, we used PCR amplification and high-throughput sequencing to accomplish our work. With regard to the sample, we used PCR amplification to generate more sequences at first, then used E. coli to copy the sequences for high-throughput sequencing. Finally, Bio101 was used to decode the DNA sequences. As a result, we recovered our original file.
Fig.6.Wet lab validation flow chart.
Fig.8.The sample DNA we synthesized.
As all these we test Bio101, we have the conclusion that our software has great usability and stability. There may be some unexpected situations happening when users encode the file, you can contact us to solve the problem through our iGEM wiki. We desire to develop Bio101 with all our users and we are looking for your feedback!