Difference between revisions of "Team:UESTC-software/Proof"

Line 72: Line 72:
 
               <strong>Fault tolerance</strong>
 
               <strong>Fault tolerance</strong>
 
               <br>
 
               <br>
               <b>The distribution of errors</b>
+
               <strong style="font-style: italic;">The distribution of errors</strong>
 
               <p>We choose n files including large files (>1M) and small files (<1M), distributing them on 1:1. </p>
 
               <p>We choose n files including large files (>1M) and small files (<1M), distributing them on 1:1. </p>
 
               <p>We recognize under the equal errors, successive errors which is more than 5% of the sequences length is successive type. The other is discrete type. Especially when we test one type of the distributions, we consider all kinds of errors. For example, when we test the discrete type, we have discrete deletion, discrete insert and discrete substitution equally, then calculate the successful encoding rate. The test of successive distribution is the same.</p>
 
               <p>We recognize under the equal errors, successive errors which is more than 5% of the sequences length is successive type. The other is discrete type. Especially when we test one type of the distributions, we consider all kinds of errors. For example, when we test the discrete type, we have discrete deletion, discrete insert and discrete substitution equally, then calculate the successful encoding rate. The test of successive distribution is the same.</p>

Revision as of 19:41, 15 October 2016

三级页面

Proof

Having developed our DNA information storage system, it is important to validate our software performs its intended function. We did thorough dry-lab testing and wet-lab validation, in which we tested the efficiency and safety of our system and successfully restored our file in synthesized DNA sequences.

Dry-lab testing

Bio101 software testing

To examine the usability and stability of the software Bio101, we design the software testing model. We test the software in the aspects of errors (deletion, insert and substitute) and the distribution of the errors (discrete or successive). We control the conditions artificially so that we can compare the different situations obviously.

Fault tolerance
The distribution of errors

We choose n files including large files (>1M) and small files (<1M), distributing them on 1:1.

We recognize under the equal errors, successive errors which is more than 5% of the sequences length is successive type. The other is discrete type. Especially when we test one type of the distributions, we consider all kinds of errors. For example, when we test the discrete type, we have discrete deletion, discrete insert and discrete substitution equally, then calculate the successful encoding rate. The test of successive distribution is the same.

Here we list the result.


Tab.1.Discrete distribution.


Tab.2.Successive distribution.

* √: successful ×: failed

The conclusion is that the fault tolerance of discrete distribution is better than successive distribution’s.


Fig.1.The result comparison.

Error types

In a similar approach, we deal with the files in the same way to test errors one by one. We choose n files including large files (>1M) and small files (<1M), distributing them on 1:1. The following is the result.


Tab.3.Deletion


Tab.4.Insert


Tab.5.Substitution

* √: successful ×: failed


Fig.2.The result of three tests.

Randomness

In consideration of biological safety, we should produce sequences with sufficient random distributed A, T, C, G. We recognize the successive number of the same bases as the standard to test randomness.

The percentage of normal length of successive bases & the length of the sequence

the longest successive bases & the sequence length

As all these we test Bio101, we have the conclusion that our software has great usability and stability. There may be some unexpected situations happening when users encode the files, you can contact us to solve the problem through our iGEM wiki. We desire to improve Bio101 with users and we are looking forward your feedback!

*Our wiki: https://2016.igem.org/Team:UESTC-software

Wet lab validation

Besides software testing, wet lab validation is also needed.


Fig.3.Wet lab validation flow chart.

We transformed the chosen file, “sSBOLv.svg”, to DNA sequence file. Going through thorough data analysis and safety confirmation, we connected a biotech company to help us synthesize the DNA sequences.

DNA sequences carrying our file information should be stored in host cells. We chose E.coli TOP10 to store plasmids. The synthesized DNA sequences were transformed into pUC47.


Fig.4.Plasmids transformation.

After a week of storage, we took out the sample for sequencing. In order to improve the accuracy of sequencing, we used PCR amplification and high-throughput sequencing to accomplish our work. With regard to the sample, we used PCR amplification to generate more sequences at first. Then used E. coli to copy the sequences for high-throughput sequencing.

In the end, we uploaded the DNA sequences file to our software and decoded them. At last, we achieved the original file perfectly.

CATALOGUE