Difference between revisions of "Team:UESTC-software/Design"

 
(28 intermediate revisions by 2 users not shown)
Line 25: Line 25:
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Design">Design</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Design">Design</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Features">Features</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Features">Features</a></li>
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Model">Model</a></li>
+
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Model">Modeling</a></li>
 
                          
 
                          
 
                         <li class="three-nav"><a href="https://2016.igem.org/Team:UESTC-software/Proof">Proof</a></li>
 
                         <li class="three-nav"><a href="https://2016.igem.org/Team:UESTC-software/Proof">Proof</a></li>
 
                         <li class="three-nav"><a href="https://2016.igem.org/Team:UESTC-software/Demonstrate">Results</a></li>
 
                         <li class="three-nav"><a href="https://2016.igem.org/Team:UESTC-software/Demonstrate">Results</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Future">Future</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Future">Future</a></li>
 +
                        <li><a href="https://2016.igem.org/Team:UESTC-software/Parts">Parts</a></li>
 
                         <li class="three-nav"><a href="https://2016.igem.org/Team:UESTC-software/Extra_work">Extra Work—Bio2048</a></li>
 
                         <li class="three-nav"><a href="https://2016.igem.org/Team:UESTC-software/Extra_work">Extra Work—Bio2048</a></li>
 
                     </ul>
 
                     </ul>
 
                 </li>
 
                 </li>
                 <li><a href="https://2016.igem.org/Team:UESTC-software/Judging">JUDGING</a>
+
                 <li><a href="https://2016.igem.org/Team:UESTC-software/Judging?id=1">JUDGING</a>
 
                     <ul class="sub-nav">
 
                     <ul class="sub-nav">
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Medal_requirements">Medal Requirements</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Medal_requirements">Medal Requirements</a></li>
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Safety">Safety</a></li>
+
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Safety?">Safety</a></li>
 
                     </ul>
 
                     </ul>
 
                 </li>
 
                 </li>
                 <li><a href="https://2016.igem.org/Team:UESTC-software/Team">TEAM</a>
+
                 <li><a href="https://2016.igem.org/Team:UESTC-software/Team?id=2">TEAM</a>
 
                     <ul class="sub-nav">
 
                     <ul class="sub-nav">
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Members">Team</a></li>
+
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Members?id=2&index=0">Team</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Collaborations">Collaborations</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Collaborations">Collaborations</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Notebooks">Notebooks</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Notebooks">Notebooks</a></li>
Line 48: Line 49:
 
                 <li><a href="https://2016.igem.org/Team:UESTC-software/HP">HUMAN PRACTICES</a>
 
                 <li><a href="https://2016.igem.org/Team:UESTC-software/HP">HUMAN PRACTICES</a>
 
                     <ul class="sub-nav">
 
                     <ul class="sub-nav">
                        <li><a href="https://2016.igem.org/Team:UESTC-software/HP/Gold">Gold</a></li>
+
<li><a href="https://2016.igem.org/Team:UESTC-software/HP/Silver">Silver</a></li>                      
                        <li><a href="https://2016.igem.org/Team:UESTC-software/HP/Silver">Silver</a></li>
+
<li><a href="https://2016.igem.org/Team:UESTC-software/HP/Gold">Gold</a></li>
 +
                     
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Integrated_Practices">Integrated Practices</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Integrated_Practices">Integrated Practices</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Engagement">Engagement</a></li>
 
                         <li><a href="https://2016.igem.org/Team:UESTC-software/Engagement">Engagement</a></li>
Line 66: Line 68:
 
</div>
 
</div>
 
<div class="detail-content">
 
<div class="detail-content">
             <p>To realize a fluent and efficient DNA-based information storage system, great efforts have been devoted to designing its web system.</p>
+
             <p>The design of Bio101 involves two aspects, the algorithm for encoding information in DNA sequences and the software implementation of the algorithm.</p>
             <h2 id="The optimization of the front-end">The optimization of the front-end</h2>
+
            <h2 id="System Design">System Design</h2>
             <strong>Website based</strong>
+
            <strong>Workflow</strong>
             <p>Due to the rapid development of network, web is now used extensively throughout the whole world. Meanwhile, website is more likely to be used without installing any app. Most importantly, website is compatible with different equipment and systems such as windows, mac os, linux and so on. So we developed our software based on website. In order to ensure the stability and high efficiency, we accomplished all computational calculating on our server and provided users an hyperlink to download the final result. </p>
+
            <p>The overall workflow of <B>Bio101:DNA Information Storage System</B> is shown below (Fig. 1). Given a computer file, we <b>compress</b> it first, and then <b>encrypt</b> the file with the ISAAC algorithm <sup>[1]</sup>. The resulting binary file is <b>converted to a nucleotide (NT) stream</b> through a direct map between two bits and one base. The long NT stream is then fragmentized into short sequences and file <b>indexing and error checking NTs</b> are added to the ends of the sequences.</p>
 +
            <p class="img-p" style="font-size: 13px; " ><img src="https://static.igem.org/mediawiki/2016/b/b3/Uestc-software-design-1.png"/ style="margin-left:-70px;width:110%;"><br/><B>Fig.1.</B>Workflow for encoding and decoding processes.</p>
 +
            <p>When the encoding process is finished, the resulting DNA sequences file can be delivered to a DNA synthesis company for the physical write-in.</p>
 +
            <p>The decoding process involves the physical read-out through sequencing the DNA sample and the reverse steps of the encoding steps.</p>
 +
            <strong>Compression is the First Step</strong>
 +
            <p>Before the file is transformed to DNA sequences, a compression step is applied in order to achieve higher information storage density <sup>[2]</sup>. Instead of designing a new compression algorithm of our own, we choose the well-known, bzip2, algorithm for this purpose to avoid reinventing the wheel <sup>[3]</sup>. Shorter file length corresponds to fewer DNA sequences which means less cost for synthesis and sequencing. </p>
 +
            <strong>Encryption is the Key</strong>
 +
            <p>The purpose of the encryption step is two-fold. </p>
 +
            <p>First, for a good information storage system, <b>data safety</b> is essential. Encryption ensures that one who obtains the physical DNA samples and knows the encoding protocol still cannot crack the information without the correct password. </p>
 +
            <p>Second, the encoded DNA sequences should be as <b>artificial and random</b> as possible. This means that the sequences should be devoid of long homopolymers, such as GGGGGG, or repeated base patterns, such as AGTCAGTCAGTC, which are difficult to synthesize and sequence, and the sequences <b>should not possess any biological functions</b> so that the DNA samples are safe to people and environment. </p>
 +
            <p>The ISAAC encryption algorithm <sup>[1]</sup> is chosen for this step. Pseudo-random bit sequences are generated while the original information is encrypted.  The possibility to produce homopolymers, repeated base patterns or fragments with biological functions  is extremely low. </p>
 +
            <strong>Bits to Nucleotides </strong>
 +
            <p>The randomized bits stream is converted into nucleotides (NT) stream using the straightforward map between two bits and one NT, 00 -> A, 01 -> C, 10 ->G, 11 -> T  <sup>[4]</sup>. </p>
 +
            <strong>Indexing is Essential for Data Integrity </strong>
 +
            <p>DNA in nature is a long one-dimensional molecule. However, the current technology only allows us to synthesize short oligonucleotides while the long DNA molecule with billions of nucleotides is still nature’s privilege.  In order to store a large piece of information, we have to fragmentize a long DNA sequence into oligonucleotides with a few hundreds of bases. However, simply fragmentizing the sequence would destroy all the data, since the order of these fragments is unknown.</p>
 +
            <p>To overcome this problem, indexing the fragments is necessary. We add sequence address code and error check code to the original fragments. The address code tells the <b>position</b> of one sequence in the file, while the check code acts as a <b>‘parity NT’</b> <sup>[5]</sup> to check for errors in the encoded information, including length, address code, etc. Further scrambling is applied to the sequences to avoid homopolymers. </p>
 +
            <p>In order to address the possible errors introduced in the synthesis, storage or sequencing, four-fold redundancy is used.</p>  
 +
             <h2 id="DNA File Editing">DNA File Editing</h2>
 +
            <strong>Workflow</strong>
 +
            <p><b>Random access</b>, which means reading and writing a piece of information at arbitrary position in a file, is an important requirement for DNA-based information processing and computing system.  Although much effort has been invested to ensure the information integrity in the above design scheme, it is not suitable for the file editing purpose: many DNA fragments need to be replaced even just one bit is changed in the middle of a file.  </p>
 +
            <p>To address this need, we design an alternative encoding scheme as shown below (Fig. 2). In the scheme, the original file is fragmentized first. Then the fragment bit sequences are encrypted and randomized using the <B>ISAAC</B> algorithm followed by the conversion into NT sequences and the indexing step. </p>
 +
            <p class="img-p" style="font-size: 13px;"><img src="https://static.igem.org/mediawiki/2016/b/b1/Uestc-software-design-2.png"/><br/><B>Fig.2.</B>Workflow for DNA File Editing.</p>
 +
            <strong>DNA File Editing Operation</strong>
 +
            <p>To edit a particular part of the file, we need to locate the DNA fragment where the to-be-modified information is located. To replace the information, two operations are needed. <b>(1)</b> The original fragment needs to be broken. <b>(2)</b> The new information needs to be encoded with the same procedure and compatible indexing information with the original sequence. </p>
 +
            <p>The second step is not an issue since each piece of information is encoded independently in this new scheme and only the index needs special care.</p>
 +
            <p>To degrade the original fragment, we propose to use the <b>CRISPR/Cas9</b> system which has general editing ability through a recognition step using the single-guide RNA (sgRNA). For this purpose, the GG nucleotide pairs are included in the encoded sequences to ensure that there is a PAM site. Bio101 additionally provides a <b>BioBrick part</b>,  a sgRNA expression cassette to work with the Cas9 system to guide the degradation of the targeted DNA sequences.</p>
 +
            <p class="img-p" style="font-size: 13px;"><img src="https://static.igem.org/mediawiki/2016/0/0d/Uestc-software-design-3.png
 +
            "/><br/><B>Fig.3.</B>DNA file editing process.</p>
 +
            <h2 id="Front-end">Front-end</h2>
 +
             <strong>Web-based</strong>
 +
             <p>Due to the rapid development of Internet, web is now used extensively throughout the whole world. Meanwhile, a web application can be used without any installation hassle. More importantly, a web application can be accessed from many different devices from conventional computers to smart phones running with all kinds of operating systems. Therefore, we developed our software based on a web application model for a wider audience. All the computational work is done on our server while the results are provided as a hyperlink for users to download. </p>
 
             <strong>User-friendly Interface</strong>
 
             <strong>User-friendly Interface</strong>
             <p>The interface of our webpage is concise. We have two main buttons on our webpage—encode and decode, which can complete users’ demand of uploading, transforming and downloading files. Users can easily familiarize the operation of our software, and use our software to do more things they want. To develop a cross-platform software, HTML, CSS, bootstrap, and jQuery are integrated into the framework of the present software. The webpage is a humanized and beautiful design, as well as quick in response.</p>
+
             <p>The interface of our webpage is concise. There are five pages in total: one home page for the description on the software, one encoding page for a user to upload a file to encode it, one decoding page for a user to upload a DNA sequences file to extract the information, one about page to describe the background and algorithm of the software and one edit page to modify segment. Users can get familiar with the web-app very quickly. </p>
             <p class="img-p" style="font-size: 13px;"><img src=""/><br/><B>Fig.1.</B>The interface of Bio101.</p>
+
            <p>To develop a cross-platform software, HTML, CSS, bootstrap <sup>[6]</sup>, Owl carousel <sup>[7]</sup> and jQuery are integrated into the web framework. The webpage is designed with a modern visualization standard and compatible with smart phones and other mobile devices, as well as quick in response.</p>  
 +
             <p class="img-p" style="font-size: 13px;"><img src="https://static.igem.org/mediawiki/2016/d/d0/Uestc-software-features-3.png"/><br/><B>Fig.4.</B>The interface of Bio101.</p>
 
             <strong>Clear operation flow</strong>
 
             <strong>Clear operation flow</strong>
             <p>Our website has a clear operation flow as the following figure. When a user starts to encode a file, he or she will submit a file and a code as token to encrypt the file. The file will be compressed and encrypted after stored in our server. After that, process goes to encode it and user will go to the ‘Download’ page after the encoding process. User can choose txt, fasta or SBOL-xml format to download the final DNA sequences. As for decoding, user will submit a file with DNA sequences which are encoded by our software and a code which is set when encoding the file. Decoding will start after the file being stored in our server. Then, the file will be decrypted by token and decompressed. After that, website will skip to ‘Download’ page, so user can download the decoded file.</p>
+
             <p>Our website has a clear operation flow as the following figure (Fig.5). When a user starts to encode a file, he or she will submit a file and a code as token to encrypt the file. The file will be compressed and encrypted after stored in our server. After that, process goes to encode it and user will go to the ‘Download’ page after the encoding process. User can choose txt, fasta or SBOL-xml <sup>[8]</sup> format to download the final DNA sequences. As for decoding, user will submit a file with DNA sequences which are encoded by our software and a code which is set when encoding the file. Decoding will start after the file being stored in our server. Then, the file will be decrypted by token and decompressed. After that, website will skip to ‘Download’ page, so user can download the decoded file.</p>
             <p class="img-p" style="font-size: 13px;"><img src=""/><br/><B>Fig.2.</B>Design process of Bio101.</p>
+
             <p class="img-p" style="font-size: 13px;"><img src="https://static.igem.org/mediawiki/2016/7/79/Uestc-software-design-5.png"/><br/><B>Fig.5.</B>Design process of Bio101.</p>
             <h2 id="The design of back-end">The design of back-end</h2>
+
             <h2 id="Back-end">Back-end</h2>
            <strong>Rigorous process designed </strong>
+
            <p>Before the file transformed to DNA sequences, a compression step is needed, which can help decrease the length of synthesized DNA sequences to reduce the consuming of money and time. Thanks for Martin Scharm’s blog <a href="https://binfalse.de/2011/04/04/comparison-of-compression/" target="_blank">“Comparison of compression”</a>systematically analyzed different compression algorithms. We choose bzip2 to compress file. In consideration of a good information storage system, encrypting the message is essential. So after compression process, we use a fast cryptographic random number generator(ISAAC64) to encrypt the compressed file to minimize the safety cases so as to keep the information secret. Then, we need to transform the binary numbers to DNA sequences. In order to store various large pieces of information, we fragment the long DNA sequence into pieces and add each new sequence address code and check code, which help to rebuild the sequence without errors.</p>
+
            <p class="img-p" style="font-size: 13px;"><img src=""/><br/><B>Fig.3.</B>The process of encoding and decoding.</p>
+
 
             <strong>Different file formats supported</strong>
 
             <strong>Different file formats supported</strong>
             <p>Our software supports the transforming of all formats of files, including jpg, pdf, mp3, etc. So users can store all kinds of computer files in DNA. On the other hand, we provide different formats of recording DNA sequences for users to download, including txt, xml, SBOL, etc. Users can easily use these different formats of files to do more things.</p>
+
             <p>Our software supports the transforming of all formats of files, including jpg, pdf, mp3, etc. So users can store virtually all kinds of computer files in DNA. On the other hand, we provide different formats of DNA sequences for users to download, including Text, FASTA and SBOL. </p>
             <strong>C language and Python Combined</strong>
+
             <strong>C and Python Combined</strong>
 
             <p>C language has high execution efficiency, crossing platform application, etc. Whereas Python holds a great promise for conciseness, extensibility, abundant library, etc. So, the two are combined in Bio101 to form an ideal environment. The encryption and bit2nt parts are handled in C language while the rest is in Python, to guarantee the efficiency and the extensibility of the program. </p>
 
             <p>C language has high execution efficiency, crossing platform application, etc. Whereas Python holds a great promise for conciseness, extensibility, abundant library, etc. So, the two are combined in Bio101 to form an ideal environment. The encryption and bit2nt parts are handled in C language while the rest is in Python, to guarantee the efficiency and the extensibility of the program. </p>
             <p class="img-p" style="font-size: 13px;"><img src=""/><br/><B>Fig.4.</B>Bio101 combines Python and C programming language.</p>
+
             <p class="img-p" style="font-size: 13px;"><img src="https://static.igem.org/mediawiki/2016/f/f8/Uestc-software-design-6.png"/><br/><B>Fig.6.</B>Bio101 combines Python and C programming language.</p>
             <h2 id="Special design for DNA editing">Special design for DNA editing</h2>
+
             <h2 id="Web Framework">Web Framework</h2>
            <strong>CRISPR-Cas9 system—information edited</strong>
+
            <p>The ability to randomly edit the information stored in DNA sequences is significant. Although it is hard to realize perfectly at present, we put forward an idea which can edit information to a certain extent. </p>
+
            <p>Our simple idea is to replace the old sequence by the new sequence. But it is hard to find one specific sequence in the whole system. With the development of biotechnology, the appearance of CRISPR-Cas9 system makes it possible.</p>
+
            <strong>Abstract of CRISPR-Cas9 system</strong>
+
            <p>Cas9 (CRISPR associated protein 9) is an RNA-guided DNA endonuclease enzyme associated with the CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) type II adaptive immunity system in Streptococcus pyogenes, among other bacteria. S. pyogenes utilizes Cas9 to interrogate and cleave foreign DNA,[1] such as invading bacteriophage DNA or plasmid DNA.[2] Cas9 performs this interrogation by unwinding foreign DNA and checking whether it is complementary to the 20 basepair spacer region of the guide RNA. If the DNA substrate is complementary to the guide RNA, Cas9 cleaves the invading DNA.</p>
+
            <strong>DNA information edited</strong>
+
            <P>Users input the edited file, and our software will find the modified part and the corresponding DNA sequence and define the PAM site. Then generate a new sequence to replace the old sequence and design a sequence of sgRNA based on the upstream sequence of PAM site. User can use this sgRNA and through CRISPR-Cas9 system, the old sequence will be targeted gene-knockout. Then users can add the new sequence to the storage system. Which means old sequence is replaced by the new sequence and old information is replaced by new information. </P>
+
            <h2 id="Framework of our website">Framework of our website</h2>
+
 
             <strong>Web Programming with Django</strong>
 
             <strong>Web Programming with Django</strong>
             <p>The front-end and back-end are separated, which are connected with Django web framework. Django is a high-level Python Web framework that facilitates rapid development and clean, pragmatic design. It’s also a free and open source. When users upload a file to the server-side interface, the back-end works, and then a DNA sequences file will be returned for the users to download. Developers can easily improve the codes in the back-end without worrying about any conflict with present front-end codes. </p>
+
             <p>The front-end and back-end are separated, which are connected with Django web framework. Django is a high-level Python web framework that facilitates rapid development and clean, pragmatic design <sup>[9]</sup>. It is also free and open source. When users upload a file to the server-side interface, the back-end works, and then a DNA sequences file will be returned for the users to download. Developers can easily improve the codes in the back-end without worrying about any conflict with present front-end codes. </p>
            <p class="img-p" style="font-size: 13px;"><img src=""/><br/><B>Fig.5.</B></p>
+
 
             <h2 id="References">References</h2>
 
             <h2 id="References">References</h2>
            <br>  
+
                <br>  
 
                 <ul>
 
                 <ul>
                 <li style="font-size:13px;">[1] Heler R, Samai P, Modell JW, Weiner C, Goldberg GW, Bikard D, Marraffini LA (Mar 2015). "Cas9 specifies functional viral targets during CRISPR-Cas adaptation". Nature. 519 (7542): 199–202. Bibcode:2015Natur.519..199H. doi:10.1038/nature14245</li>  
+
                 <li style="font-size:13px;">[1] Robert J. Jenkins Jr. (1996) ISAAC: a fast cryptographic random number generator FSE 1996: 41-49. Available from:<a href="http://burtleburtle.net/bob/rand/isaacafa.html" style="color: #439ea3;">http://burtleburtle.net/bob/rand/isaacafa.html  </a></li>
                 <li style="font-size:13px;">[2] Jinek M, Chylinski K, Fonfara I, Hauer M, Doudna JA, Charpentier E (Aug 2012). "A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity". Science. 337 (6096): 816–21. Bibcode:2012Sci...337..816J. doi:10.1126/science.1225829</li>  
+
                <li style="font-size:13px;">[2] CUHK IGEM (2010) Bioencryption [online]. Available from: <a href="https://2010.igem.org/Team:Hong_Kong-CUHK" target="_blank" style="color: #439ea3;">https://2010.igem.org/Team:Hong_Kong-CUHK</a> </li>
 +
                <li style="font-size:13px;">[3]bzip2 [online] Available from:<a href=" http://www.bzip.org/" target="_blank" style="color: #439ea3;">http://www.bzip.org/ </a></li>  
 +
                 <li style="font-size:13px;">[4] Yim AK-Y, Yu AC-S, LiJ-W, Wong AI-C, Loo JFC, Chan KM, Kong SK, Yip KY and Chan T-F (2014) The essential component in DNA-based information storage system: robust error-tolerating module. Front. Bioeng. Biotechnol. 2:49. doi: 10.3389/fbioe.2014.00049 </li>
 +
                <li style="font-size:13px;">[5] Tabatabaei Yazdi, S. M. H. et al (2015) A Rewritable, Random-Access DNA-BasedStorage System. Sci. Rep. 5, 14138; doi: 10.1038/srep14138 </li>
 +
                <li style="font-size:13px;">[6] Bootstrap is the most popular HTML, CSS, and JS framework for developing responsive, mobile first projects on the web. [online] Available from:
 +
<a href=" http://getbootstrap.com/" target="_blank" style="color: #439ea3;">http://getbootstrap.com/</a>  </li>
 +
                <li style="font-size:13px;">[7] OWL Carousel Touch enabled jQuery plugin that lets you create beautiful responsive carousel slider. [online] Available from:
 +
<a href="http://owlgraphic.com/owlcarousel/" target="_blank" style="color: #439ea3;">http://owlgraphic.com/owlcarousel/</a> </li>
 +
                <li style="font-size:13px;">[8] Synthetic Biology Open Language (SBOL) [online] Available from:
 +
<a href="http://sbolstandard.org/" target="_blank" style="color: #439ea3;">http://sbolstandard.org/</a></li>
 +
                <li style="font-size:13px;">[9] Django: The web framework for perfectionists with deadlines. [online] Available from:
 +
<a href="https://www.djangoproject.com/" target="_blank" style="color: #439ea3;">https://www.djangoproject.com/</a> </li>  
 
                 </ul>
 
                 </ul>
 
            
 
            
Line 109: Line 141:
 
     <div class="footer-top">
 
     <div class="footer-top">
 
         <p>FOLLOW US:
 
         <p>FOLLOW US:
             <a href="https://github.com/IGEM-UESTC-software" target="_blank"><img src="https://static.igem.org/mediawiki/igem.org/0/06/Uestc_software-github.png" /></a>
+
             <a href="https://github.com/igemsoftware2016/UESTC-Software-2016" target="_blank"><img src="https://static.igem.org/mediawiki/igem.org/0/06/Uestc_software-github.png" /></a>
 
             <a href="http://www.uestc.edu.cn/" target="_blank"><img src="https://static.igem.org/mediawiki/igem.org/a/a4/Uestc_software-school.png" /></a>
 
             <a href="http://www.uestc.edu.cn/" target="_blank"><img src="https://static.igem.org/mediawiki/igem.org/a/a4/Uestc_software-school.png" /></a>
 
             <a href="http://weibo.com/u/5621240588?refer_flag=1001030101_&is_hot=1" target="_blank"><img src="https://static.igem.org/mediawiki/igem.org/b/b1/Uestc_software-weibo.png" /></a>
 
             <a href="http://weibo.com/u/5621240588?refer_flag=1001030101_&is_hot=1" target="_blank"><img src="https://static.igem.org/mediawiki/igem.org/b/b1/Uestc_software-weibo.png" /></a>
Line 126: Line 158:
 
     <ul>
 
     <ul>
 
     <li>
 
     <li>
         <a href="#The optimization of the front-end">
+
         <a href="#System Design">
 
         <span></span>
 
         <span></span>
         The optimization of the front-end
+
         System Design
 
         </a>
 
         </a>
 
     </li>
 
     </li>
 
     <li>
 
     <li>
         <a href="#The design of back-end">
+
         <a href="#DNA File Editing">
             The design of back-end
+
        <span></span>
 +
            DNA File Editing
 +
        </a>
 +
    </li>
 +
    <li>
 +
        <a href="#Front-end">
 +
        <span></span>
 +
            Front-end
 +
        </a>
 +
    </li>
 +
    <li>
 +
        <a href="#Back-end">
 +
        <span></span>
 +
             Back-end
 
         </a>
 
         </a>
 
     </li>
 
     </li>
 
     <li>
 
     <li>
         <a href="#Special design for DNA editing">
+
         <a href="#Web Framework">
             Special design for DNA editing
+
             <span></span>
 +
            Web Framework
 
         </a>
 
         </a>
 
     </li>
 
     </li>
 
     <li>
 
     <li>
         <a href="#Framework of our website">
+
         <a href="#References">
             Framework of our website
+
             <span></span>
 +
            References
 
         </a>
 
         </a>
 
     </li>
 
     </li>

Latest revision as of 13:16, 4 November 2016

三级页面

Design

The design of Bio101 involves two aspects, the algorithm for encoding information in DNA sequences and the software implementation of the algorithm.

System Design

Workflow

The overall workflow of Bio101:DNA Information Storage System is shown below (Fig. 1). Given a computer file, we compress it first, and then encrypt the file with the ISAAC algorithm [1]. The resulting binary file is converted to a nucleotide (NT) stream through a direct map between two bits and one base. The long NT stream is then fragmentized into short sequences and file indexing and error checking NTs are added to the ends of the sequences.


Fig.1.Workflow for encoding and decoding processes.

When the encoding process is finished, the resulting DNA sequences file can be delivered to a DNA synthesis company for the physical write-in.

The decoding process involves the physical read-out through sequencing the DNA sample and the reverse steps of the encoding steps.

Compression is the First Step

Before the file is transformed to DNA sequences, a compression step is applied in order to achieve higher information storage density [2]. Instead of designing a new compression algorithm of our own, we choose the well-known, bzip2, algorithm for this purpose to avoid reinventing the wheel [3]. Shorter file length corresponds to fewer DNA sequences which means less cost for synthesis and sequencing.

Encryption is the Key

The purpose of the encryption step is two-fold.

First, for a good information storage system, data safety is essential. Encryption ensures that one who obtains the physical DNA samples and knows the encoding protocol still cannot crack the information without the correct password.

Second, the encoded DNA sequences should be as artificial and random as possible. This means that the sequences should be devoid of long homopolymers, such as GGGGGG, or repeated base patterns, such as AGTCAGTCAGTC, which are difficult to synthesize and sequence, and the sequences should not possess any biological functions so that the DNA samples are safe to people and environment.

The ISAAC encryption algorithm [1] is chosen for this step. Pseudo-random bit sequences are generated while the original information is encrypted. The possibility to produce homopolymers, repeated base patterns or fragments with biological functions is extremely low.

Bits to Nucleotides

The randomized bits stream is converted into nucleotides (NT) stream using the straightforward map between two bits and one NT, 00 -> A, 01 -> C, 10 ->G, 11 -> T [4].

Indexing is Essential for Data Integrity

DNA in nature is a long one-dimensional molecule. However, the current technology only allows us to synthesize short oligonucleotides while the long DNA molecule with billions of nucleotides is still nature’s privilege. In order to store a large piece of information, we have to fragmentize a long DNA sequence into oligonucleotides with a few hundreds of bases. However, simply fragmentizing the sequence would destroy all the data, since the order of these fragments is unknown.

To overcome this problem, indexing the fragments is necessary. We add sequence address code and error check code to the original fragments. The address code tells the position of one sequence in the file, while the check code acts as a ‘parity NT’ [5] to check for errors in the encoded information, including length, address code, etc. Further scrambling is applied to the sequences to avoid homopolymers.

In order to address the possible errors introduced in the synthesis, storage or sequencing, four-fold redundancy is used.

DNA File Editing

Workflow

Random access, which means reading and writing a piece of information at arbitrary position in a file, is an important requirement for DNA-based information processing and computing system. Although much effort has been invested to ensure the information integrity in the above design scheme, it is not suitable for the file editing purpose: many DNA fragments need to be replaced even just one bit is changed in the middle of a file.

To address this need, we design an alternative encoding scheme as shown below (Fig. 2). In the scheme, the original file is fragmentized first. Then the fragment bit sequences are encrypted and randomized using the ISAAC algorithm followed by the conversion into NT sequences and the indexing step.


Fig.2.Workflow for DNA File Editing.

DNA File Editing Operation

To edit a particular part of the file, we need to locate the DNA fragment where the to-be-modified information is located. To replace the information, two operations are needed. (1) The original fragment needs to be broken. (2) The new information needs to be encoded with the same procedure and compatible indexing information with the original sequence.

The second step is not an issue since each piece of information is encoded independently in this new scheme and only the index needs special care.

To degrade the original fragment, we propose to use the CRISPR/Cas9 system which has general editing ability through a recognition step using the single-guide RNA (sgRNA). For this purpose, the GG nucleotide pairs are included in the encoded sequences to ensure that there is a PAM site. Bio101 additionally provides a BioBrick part, a sgRNA expression cassette to work with the Cas9 system to guide the degradation of the targeted DNA sequences.


Fig.3.DNA file editing process.

Front-end

Web-based

Due to the rapid development of Internet, web is now used extensively throughout the whole world. Meanwhile, a web application can be used without any installation hassle. More importantly, a web application can be accessed from many different devices from conventional computers to smart phones running with all kinds of operating systems. Therefore, we developed our software based on a web application model for a wider audience. All the computational work is done on our server while the results are provided as a hyperlink for users to download.

User-friendly Interface

The interface of our webpage is concise. There are five pages in total: one home page for the description on the software, one encoding page for a user to upload a file to encode it, one decoding page for a user to upload a DNA sequences file to extract the information, one about page to describe the background and algorithm of the software and one edit page to modify segment. Users can get familiar with the web-app very quickly.

To develop a cross-platform software, HTML, CSS, bootstrap [6], Owl carousel [7] and jQuery are integrated into the web framework. The webpage is designed with a modern visualization standard and compatible with smart phones and other mobile devices, as well as quick in response.


Fig.4.The interface of Bio101.

Clear operation flow

Our website has a clear operation flow as the following figure (Fig.5). When a user starts to encode a file, he or she will submit a file and a code as token to encrypt the file. The file will be compressed and encrypted after stored in our server. After that, process goes to encode it and user will go to the ‘Download’ page after the encoding process. User can choose txt, fasta or SBOL-xml [8] format to download the final DNA sequences. As for decoding, user will submit a file with DNA sequences which are encoded by our software and a code which is set when encoding the file. Decoding will start after the file being stored in our server. Then, the file will be decrypted by token and decompressed. After that, website will skip to ‘Download’ page, so user can download the decoded file.


Fig.5.Design process of Bio101.

Back-end

Different file formats supported

Our software supports the transforming of all formats of files, including jpg, pdf, mp3, etc. So users can store virtually all kinds of computer files in DNA. On the other hand, we provide different formats of DNA sequences for users to download, including Text, FASTA and SBOL.

C and Python Combined

C language has high execution efficiency, crossing platform application, etc. Whereas Python holds a great promise for conciseness, extensibility, abundant library, etc. So, the two are combined in Bio101 to form an ideal environment. The encryption and bit2nt parts are handled in C language while the rest is in Python, to guarantee the efficiency and the extensibility of the program.


Fig.6.Bio101 combines Python and C programming language.

Web Framework

Web Programming with Django

The front-end and back-end are separated, which are connected with Django web framework. Django is a high-level Python web framework that facilitates rapid development and clean, pragmatic design [9]. It is also free and open source. When users upload a file to the server-side interface, the back-end works, and then a DNA sequences file will be returned for the users to download. Developers can easily improve the codes in the back-end without worrying about any conflict with present front-end codes.

References


  • [1] Robert J. Jenkins Jr. (1996) ISAAC: a fast cryptographic random number generator FSE 1996: 41-49. Available from:http://burtleburtle.net/bob/rand/isaacafa.html
  • [2] CUHK IGEM (2010) Bioencryption [online]. Available from: https://2010.igem.org/Team:Hong_Kong-CUHK
  • [3]bzip2 [online] Available from:http://www.bzip.org/
  • [4] Yim AK-Y, Yu AC-S, LiJ-W, Wong AI-C, Loo JFC, Chan KM, Kong SK, Yip KY and Chan T-F (2014) The essential component in DNA-based information storage system: robust error-tolerating module. Front. Bioeng. Biotechnol. 2:49. doi: 10.3389/fbioe.2014.00049
  • [5] Tabatabaei Yazdi, S. M. H. et al (2015) A Rewritable, Random-Access DNA-BasedStorage System. Sci. Rep. 5, 14138; doi: 10.1038/srep14138
  • [6] Bootstrap is the most popular HTML, CSS, and JS framework for developing responsive, mobile first projects on the web. [online] Available from: http://getbootstrap.com/
  • [7] OWL Carousel Touch enabled jQuery plugin that lets you create beautiful responsive carousel slider. [online] Available from: http://owlgraphic.com/owlcarousel/
  • [8] Synthetic Biology Open Language (SBOL) [online] Available from: http://sbolstandard.org/
  • [9] Django: The web framework for perfectionists with deadlines. [online] Available from: https://www.djangoproject.com/
CATALOGUE