Difference between revisions of "Team:DTU-Denmark/Software"

 
(44 intermediate revisions by 6 users not shown)
Line 8: Line 8:
 
     <meta charset="utf-8">
 
     <meta charset="utf-8">
 
     <meta name="viewport" content="width=device-width, initial-scale=1">
 
     <meta name="viewport" content="width=device-width, initial-scale=1">
 +
 
</head>
 
</head>
  
Line 22: Line 23:
 
                 <div class="caption">
 
                 <div class="caption">
 
                     <div class="col-md-5 col-sm-5 col-xs-12 title"> <!-- the approximate max number of characters ~ 400 --> <!-- EDIT -->
 
                     <div class="col-md-5 col-sm-5 col-xs-12 title"> <!-- the approximate max number of characters ~ 400 --> <!-- EDIT -->
                         <h1>CODON OPTIMIZATION SOFTWARE<p class="lead">THE TIME FOR IMPLEMENTATION OF SOPHISTICATED COMPUTATIONAL APPROACHES INTO BIOLOGY HAS FINALLY COME. THE DTU-DENMARK TEAM WAS WELL AWARE OF THIS FACT AND THEREFORE TOOK A LEAP FORWARD BY DEVELOPING TAICO, A SPECIALIZED BUT ALSO USER FRIENDLY CODON OPTIMIZATION TOOL BASED ON TAI CALCULATION.</p></h1>
+
                         <h1>CODON OPTIMIZATION SOFTWARE<p class="lead">The time for implementation of sophisticated computational approaches into biology has finally come. The DTU Biobuilders team was well aware of this fact and therefore took a leap forward by developing TaiCO, a unique specialized yet user-friendly codon optimization tool based on species specific tAI calculation.</p></h1>
 
                     </div>
 
                     </div>
 
                     <div class="col-md-2 col-sm-2 hidden-xs space"></div>
 
                     <div class="col-md-2 col-sm-2 hidden-xs space"></div>
 
                     <div class="col-md-5 col-sm-5 hidden-xs intro"> <!-- will be hidden on phones, duplicate the text to blockquote down below first section header, to show it there, when it dissapear-->
 
                     <div class="col-md-5 col-sm-5 hidden-xs intro"> <!-- will be hidden on phones, duplicate the text to blockquote down below first section header, to show it there, when it dissapear-->
 
                         <blockquote class="blockquote-reverse"> <!-- EDIT -->
 
                         <blockquote class="blockquote-reverse"> <!-- EDIT -->
                             <p>Biology has at least 50 more interesting years.</p>
+
                             <p>"There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies."</p>
                             <small><cite title="Source Title">James D. Watson</cite></small>
+
                             <small><i>C.A.R. Hoare</i><cite title="Source Title"></cite></small>
 
                         </blockquote>       
 
                         </blockquote>       
 
                     </div>
 
                     </div>
Line 47: Line 48:
 
              
 
              
 
             <blockquote class="visible-xs"> <!-- quote from masterhead duplicate -->
 
             <blockquote class="visible-xs"> <!-- quote from masterhead duplicate -->
                 <p>Quote Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer posuere erat a ante.</p>
+
                 <p>"There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies."</p>
                 <small>Someone famous in <cite title="Source Title">Source Title</cite></small>
+
                 <small><i>C.A.R. Hoare</i><cite title="Source Title"></cite></small>
 
             </blockquote>
 
             </blockquote>
            <h3 class="h3">Background</h3>
 
 
             <p>
 
             <p>
                 The increased use of non-conventional expression hosts, as was proposed in this years DTU-Denmark project, increases the need for codon optimization of coding sequences used for heterologous protein production. Codon optimization is typically performed by replacing each codon with the most frequently used synonymous codon observed in the host genome. The assumption that the most frequent synonymous codon is also the most efficiently translated is not necessarily true when we consider that a typical or average transcript is not one that is likely to have a higher than average translation efficiency. Highly translated transcripts often contain “reserved” codons that are not the most common but represent codons that best match the tRNA pools in the cell. These tRNA pools can be estimated by the number of tRNA genes that have an anticodon that can decode a codon. This approach is known as the tRNA Adaptation Index (TAI) (dos Reis et al. 2004) when used to assess the translation efficiency of a coding sequence.
+
                 The increased use of non-conventional organisms for conventional purposes increases the need for codon optimization of coding sequences that are used for heterologous protein production. Codon optimization is typically performed by replacing each codon with the most frequently used synonymous from the host genome. The assumption that the most frequent synonymous codon is also the most efficiently translated codon is not necessarily true when it is considered that a typical or average transcript is not one that is likely to have a higher than average translation efficiency. Highly translated transcripts often contain “reserved” codons that are not the most common. Instead, these transcripts contain codons that best match the tRNA pools in the cell. These tRNA pools can be estimated by the number of tRNA genes that have an anticodon which corresponds to a given codon. This approach is known as the tRNA Adaptation Index (tAI)<sup><a href="#references">1</a></sup> when used to assess the translation efficiency of a coding sequence.
 
             </p>
 
             </p>
 
             <h3 class="h3">From Blackboard Calculations to Software</h3>
 
             <h3 class="h3">From Blackboard Calculations to Software</h3>
 
             <p>
 
             <p>
                 TaiCO (TAI Codon Optimaztion tool) constitutes a unique computational tool for answering specific biological questions, a statement easily justified by the fact that it is the only stand-alone application in the world (to the best of the team's knowledge) with an implemented Graphic User Interface (GUI) based completely on species specific TAI calculation. The DTU-DENMARK team hopes that this software will contribute with its simplicity to faster and easy to produce Biotechnological results and become a high-end optimizing method with its unique theory implementation.  
+
                 TaiCO (tAI Codon Optimaztion tool) constitutes a unique computational tool for answering the common biological question: What coding DNA sequence will result in maximum protein expression? The DTU Biobuilders' proposal for solving this task is a stand-alone application that is based completely on species specific tAI calculation and is bundled with a simplistic Graphic User Interface (GUI) compatible with many platforms. We hope that this software will contribute to faster and easier to production of biotechnological results and become a high-end optimizing method with its unique theory implementation.
 
             </p>
 
             </p>
 +
 +
<div class="col-md-1">
 +
</div>
 +
<div class="col-md-10">
 +
<figure class="figure">
 +
                            <img id="img1" class="enlarge img-responsive figure-img" src="https://static.igem.org/mediawiki/2016/4/4e/T--DTU-Denmark--taico92.png" alt="DESCRIPTION">
 +
                        <figcaption class="figure-caption">The <b>TaiCO</b> Interface</figcaption>
 +
                    </figure>
 +
                    <div id="img1Modal" class="modal">
 +
                        <span class="close img1">×</span>
 +
                        <img class="modal-content" src="https://static.igem.org/mediawiki/2016/4/4e/T--DTU-Denmark--taico92.png"> <!--high resulution picture-->
 +
                        <div class="caption">The <b>TaiCO</b> Interface</div>
 +
                    </div>
 +
</div>
 +
<div class="col-md-1">
 +
</div>
 +
 
         </div> <!-- /overview-->
 
         </div> <!-- /overview-->
 
          
 
          
Line 63: Line 80:
 
         <h2 class="h2">Theory</h2>
 
         <h2 class="h2">Theory</h2>
 
<p>
 
<p>
As you may already know, the central issue in codon optimization is to determine which codons are most efficiently translated for each amino acid. The quantity needed for this task is called 'translatability' and is denoted \(W_i\) for the \(i\)'th codon.</p>
+
As mentioned, the central issue in codon optimization is to determine which codons are most efficiently translated for each amino acid. The quantity needed for this task is called 'translatability' and is denoted \(W_i\) for the \(i\)'th codon.</p>
 
<p>
 
<p>
To accomplish this, we have chosen to use a tRNA Adaptation Index-based method (tAI) (dosReis et. al. 2004) REFERENCE. The fundamental assumption behind this method is that highly expressed proteins have their genes encoded with a set of codons that is overall more susceptible to tRNA-binding and translation compared to less expressed proteins. Hence, this optimization estimates the codon preferences such that the correlation between protein level and tAI is maximized.</p>
+
To accomplish this, we have chosen to use a tRNA Adaptation Index-based method (tAI). The fundamental assumption behind this method is that highly expressed proteins have their genes encoded with a set of codons that is overall more susceptible to tRNA-binding and translation compared to proteins that are not highly expressed. Hence, this optimization method estimates the codon preferences in such a way that the correlation between protein level and tAI is maximized.</p>
 
<p>
 
<p>
The formulas for calculating this are stated in Table 1 in dosReis 2004 (SHOULD WE STATE THEM HERE?). Using this, all 64 \(W_i\)'s can be calculated in one matrix multiplication, by letting \(G\) be the 4\(\times\)16 matrix consisting of the tGCN's (in TaiCO referred to as 'gcn') and letting \(S\) be the 4\( \times\)4 matrix containing the (1 \(-s_{ij}\)) values. Hence,
+
The formulas for calculating individual \(W_i\)'s were stated by dosReis<sup><a href="#references">1</a></sup>. All 64 \(W_i\)'s can be calculated in one matrix multiplication, by letting \(G\) be the 4\(\times\)16 matrix consisting of the tGCN's (in TaiCO referred to as 'gcn') and letting \(S\) be the 4\( \times\)4 matrix containing the (1 \(-s_{ij}\)) values. Hence,
 
</p>
 
</p>
 
<p>
 
<p>
Line 73: Line 90:
 
</p>
 
</p>
 
<p>
 
<p>
The computed \(W_i\)'s are the normalized by setting \(w_i = \frac{W_i}{W_{\text{max}}}\), and those normalized translatabilities, \(w_i\) do then form the basis for codon selection. Higher \(w_i\)-values are simply selected over lower values. This concludes the method for codon selection.
+
The computed \(W_i\)'s are then normalized by putting \(w_i = W_i/W_{\text{max}}\), and those normalized translatabilities, \(w_i\) do then form the basis for codon selection. Higher \(w_i\)-values are simply selected over lower values.  
 
</p>
 
</p>
             <h3 class="h3">The \(G\) matrix</h3>
+
             <h3 class="h3">The \(G\) Matrix</h3>
 
                <p>
 
                <p>
\(G\) consists of 64 tGCN values, which are the gene copy number of tRNA's recognizing specific codons. Normally, available gcn-files lists the tGCN's in terms of the reversed anticodon corresponding to the recognized codon, hence, the tricodons in the raw gcn-files are reversed and have their bases replaced by the complemetary ones. For instance, in <i>S. cerevisiae</i> the gcn of tRNA's recognizing TTC (encoding glutamic acid) is 10, so in the raw file, this information is presented as the reversed anticodon, GAA, being equal to 10 instead. When converted into their encoding form, the tGCN's are put into the \(G\) matrix such that each column has the first two position fixed and each row has a fixed third position:
+
\(G\) consists of 64 tGCN values, which are the gene copy number of tRNA's recognizing specific codons. Normally, available gcn-files list the tGCN's in terms of the reversed anticodon corresponding to the recognized codon, hence, the tricodons in the raw gcn-files are reversed and have their bases replaced by the complementary ones. For instance, in <i>S. cerevisiae</i> the gcn of tRNA's recognizing TTC (encoding glutamic acid) is 10, so in the raw file, this information is presented as the reversed anticodon, GAA, being equal to 10 instead. When converted into their encoding form, the tGCN's are put into the \(G\) matrix such that each column has the first two position fixed and each row has a fixed third position:
 
                </p>
 
                </p>
 
<table>
 
<table>
Line 93: Line 110:
 
</tr>
 
</tr>
 
</table>
 
</table>
<h3 class="h3">The \(S\) matrix</h3>
+
<h3 class="h3">The \(S\) Matrix</h3>
 
                <p>
 
                <p>
While \(G\) is precisely known, \(S\) needs to be optimized. In dosReis 2004, the optimized \(s_{ij}\)-values for <i>S. cerevisiae</i> is published, yielding the \(S\)-matrix,
+
While \(G\) is precisely known, \(S\) needs to be optimized. In dosReis 2004, the optimized \(s_{ij}\)-values for <i>S. cerevisiae</i> are published, yielding the \(S\)-matrix,
 
$$
 
$$
 
    S =
 
    S =
Line 105: Line 122:
 
    \end{pmatrix}
 
    \end{pmatrix}
 
$$
 
$$
Where both rows and columns are ordered as A,C,G,T. Thus, the \(W_i\)'s computed from the \(SG\) multiplication are each influenced by two tGCN's. As an example, calculating the translatability of CCG will be equal to the dot product of the third row of \(S\) (because third position is a G), and the sixth row of \(G\) (because first two positions are CC):
+
where both rows and columns are ordered as A,C,G,T. Thus, the \(W_i\)'s computed from the \(SG\) multiplication are each influenced by two tGCN's. As an example, calculating the translatability of CCG will be equal to the dot product of the third row of \(S\) (because the third position is a G), and the sixth row of \(G\) (because the first two positions are CC):
 
$$
 
$$
 
    W_{CCG} = 0.32 \cdot \text{tGCN}_{CCA} + 1 \cdot \text{tGCN}_{CCG}
 
    W_{CCG} = 0.32 \cdot \text{tGCN}_{CCA} + 1 \cdot \text{tGCN}_{CCG}
 
$$
 
$$
clearly taking the wobbling potential of G to A in third position into account.
+
clearly taking the wobbling potential of G to A in the third position into account.
 
                </p>
 
                </p>
 
         </div>
 
         </div>
 
          
 
          
 
         <div><a class="anchor" id="section-3"></a>
 
         <div><a class="anchor" id="section-3"></a>
         <h2 class="h2">TaiCO features</h2>
+
         <h2 class="h2">TaiCO Features</h2>
 
          
 
          
 
         <p>
 
         <p>
             The DTU-DENMARK team's proposal for reliable and fast computational production of optimized DNA sequences comes under the name TaiCO. The need for a specialized software tool for optimization of Y. lipolytica DNA sequences became evident when the "product subgroup" of the team started to design constructs for protein expression. TaiCO "allowed" the team's Biotechnologists to perform extended analysis/results of the coding sequences of interest due to its simplistic architecture and resources demands.  
+
             Our proposal for reliable and fast computational production of optimized DNA sequences comes under the name TaiCO. The need for a specialized software tool for optimization of <i>Y. lipolytica</i> DNA sequences became evident when the product subgroup of DTU Biobuilders started to design constructs for protein expression. TaiCO allowed the rapid analysis of the coding sequences of interest and due to the final simplistic architecture and low resources demands it was decided to extend its capabilities for every organism with tGCN files available.
 
         </p>
 
         </p>
         <h3 class="h3">Software overview</h3>
+
         <h3 class="h3">Software Overview</h3>
 
         <p>
 
         <p>
             TaiCO is implemented in Python3. By inspecting the source code it becomes evident that the algorithm was implemented in an easily modifiable layout, due to its static philosophy with the exclusive usage of only built-in libraries and modules in addition to the already known and commonly used "Pythonic" data structures. This software comes with the Open Software license: GPL v3.  
+
             TaiCO is implemented in <a href="https://www.python.org/download/releases/3.0/">Python3</a>, a high-level programming languages with a built-in library for GUI development called tkinter. The algorithm was structured in an easily modifiable layout, due to its static philosophy with the exclusive usage of only built-in libraries and modules in addition to the already known and commonly used "Pythonic" data structures (e.g dictionaries,lists). This software comes with the Open Software license <a href="https://www.gnu.org/licenses/gpl-3.0.txt">GPL v3.</a>
             For a more descriptive view on how the algorithm was implemented, it is heavily encouraged to inspect the source code along with the README.txt file deposited in IGEM SOFTWARE GitHub repo.  
+
        </p>
 +
        <p>
 +
             For a more descriptive view on how the algorithm was implemented, links for the available versions are provided in a further section.
 
         </p>
 
         </p>
 
          
 
          
         <h3 class="h3">Input files and result</h3>
+
         <h3 class="h3">Input Files and Result</h3>
 
         <p>
 
         <p>
             The first input file requested from TaiCO is a GCN table in simple text format. Although the software comes bundled with 7 GCN files from model organisms, and thus the user is given the opportunity to choose the target organism that will be used in the wet lab for the actual sequence optimization, he/she can even upload his/her own GCN table from an organism not included in the bundle. The second input file that the user has to provide, is a list with a single or multiple protein sequences that are going to be optimized and parsed through the script in FASTA format. The final (third) input file that the user can provide,although it is considered optional but a very powerful capability, is a simple text file including the sequences of the restriction sites that have to be absent from the optimized DNA resulting sequences. The output of the analysis is a file saved in a FASTA format containing all the optimized DNA sequences.  
+
             The first input file requested from TaiCO is a GCN table in simple text format. Although the software comes bundled with 7 GCN files from model organisms, other GCN tables can be uploaded. The second input file that the user has to provide is a list with a single or even multiple protein sequences in FASTA format. The final input file that the user can provide is the powerful capability of parsing a simple text file including the sequences of the restriction sites that have to be absent from the optimized DNA resulting sequences. The output of the analysis is a file saved in a FASTA format that contains all the optimized DNA sequences.
 
         </p>
 
         </p>
 
         <div class="panel-group" id="accordion1" role="tablist" aria-multiselectable="true">
 
         <div class="panel-group" id="accordion1" role="tablist" aria-multiselectable="true">
Line 152: Line 171:
 
             <!-- image with figure caption and modal that load the same picture -->
 
             <!-- image with figure caption and modal that load the same picture -->
 
             <figure class="figure">
 
             <figure class="figure">
               <img id="step1" class="enlarge img-responsive figure-img" src="https://static.igem.org/mediawiki/2016/f/fc/Step1_codon_opt.png" alt="DESCRIPTION1">
+
               <img id="step1" class="enlarge img-responsive figure-img" src="https://static.igem.org/mediawiki/2016/d/d7/Step1_codonopt.png" alt="DESCRIPTION1">
               <figcaption class="figure-caption">Step 1: Double click on executable named "TaiCO" in "dist" folder</figcaption>
+
               <figcaption class="figure-caption">Step 1: Double click on executable named "TaiCO"</figcaption>
 
             </figure>
 
             </figure>
  
Line 160: Line 179:
 
                 <span class="close ...">×</span>
 
                 <span class="close ...">×</span>
 
                 <img class="modal-content" id="...Img">
 
                 <img class="modal-content" id="...Img">
                 <div class="caption">Step 1: Double click on executable named "TaiCO" in "dist" folder</div>
+
                 <div class="caption">Step 1: Double click on executable named "TaiCO" </div>
 
             </div>
 
             </div>
 
              
 
              
Line 166: Line 185:
 
             <!-- image with figure caption and modal that load the same picture -->
 
             <!-- image with figure caption and modal that load the same picture -->
 
             <figure class="figure">
 
             <figure class="figure">
               <img id="step2" class="enlarge img-responsive figure-img" src="https://static.igem.org/mediawiki/2016/9/91/Step2_codonopt.png" alt="DESCRIPTION2">
+
               <img id="step2" class="enlarge img-responsive figure-img" src="https://static.igem.org/mediawiki/2016/8/8b/2_codonopt.png" alt="DESCRIPTION2">
 
               <figcaption class="figure-caption">Step 2: Click on search file and select the file from your system (repeat for all "Search file" options)</figcaption>
 
               <figcaption class="figure-caption">Step 2: Click on search file and select the file from your system (repeat for all "Search file" options)</figcaption>
 
             </figure>
 
             </figure>
Line 179: Line 198:
 
               <!-- image with figure caption and modal that load the same picture -->
 
               <!-- image with figure caption and modal that load the same picture -->
 
             <figure class="figure">
 
             <figure class="figure">
               <img id="step3" class="enlarge img-responsive figure-img" src="https://static.igem.org/mediawiki/2016/9/9a/Step3_codonopt.png" alt="DESCRIPTION3">
+
               <img id="step3" class="enlarge img-responsive figure-img" src="https://static.igem.org/mediawiki/2016/9/98/3_codonopt.png" alt="DESCRIPTION3">
               <figcaption class="figure-caption">Step 3: Click "Start analysis" button and wait until the successful message
+
               <figcaption class="figure-caption">Step 3: Click "Start analysis" button and wait until the successful message is poped-up
 
               </figcaption>
 
               </figcaption>
 
             </figure>
 
             </figure>
Line 188: Line 207:
 
                 <span class="close ...">×</span>
 
                 <span class="close ...">×</span>
 
                 <img class="modal-content" id="...Img">
 
                 <img class="modal-content" id="...Img">
                 <div class="caption">Step 3: Click "Start analysis" button and wait until the successful message</div>
+
                 <div class="caption">Step 3: Click "Start analysis" button and wait until the successful message is poped-up</div>
 
             </div>
 
             </div>
 
                    
 
                    
Line 196: Line 215:
 
</div>   
 
</div>   
 
          
 
          
         <h3 class="h3">Compatibility,runtime,distribution</h3>
+
         <h3 class="h3">Core Algorithmic Approach, Runtime and Distribution</h3>
 
          
 
          
 
         <p>
 
         <p>
             The full script was “converted” into an executable file along with all included modules using the PyInstaller (2) software. This allowed us to make TaiCO available for all the “mainstream” platforms (Unix based systems,Windows,MAC OS). Due to the nature of the supporting PyInstaller software the user has only one mandatory computational task in order to be able to run the software which is to download the preferred zipped version which is stored to the IGEM’s software repository on GitHub, and contains all the essential files for the proper use of the tool. The relevant operating system version of TaiCO can be downloaded by clicking one of the following links: Windows:
+
             The completed script was “converted” into an executable file along with all included modules using the PyInstaller<sup><a href="#references">2</a></sup> software. This allowed us to make TaiCO available for almost all Unix-based and Windows platforms. Due to the nature of the supporting PyInstaller software the user has only one mandatory computational task in order to be able to run the software, which is to download the preferred zipped version. The application runtime remains under 5 sec (system dependent) due to the computationally efficient production of optimized sequences codon by codon according to the corresponding maximum \(w_i\) value. In the case of restriction site elimination the selection of codons is performed along with the sequence production and less optimized codons are therefore incorporated. After the downloading procedure is done, the careful reading of the two README files in the relevant folder is strongly recommended. The system compatible version of TaiCO can be downloaded by clicking one of the following links:</p>
 
+
           
             Unix:
+
        <p>
 
+
            <b>Windows</b>: Click <a href="https://static.igem.org/mediawiki/2016/5/5a/TaiCO_windows.zip">here</a> to download Windows version
             MacOS:
+
        </p>
 
+
        <p>
             For further information regarding terms of use and how to use it properly you are strongly advised to inspect the README.txt file or contact the author by email: vrantos@hotmail.gr  
+
             <b>Unix</b>: Click <a href="https://static.igem.org/mediawiki/2016/0/0a/TaiCO_Ubuntu.zip">here </a> to download Unix version
       
+
        </p>
 +
        <p>
 +
             <b>Mac OS X</b>: There is no specific bundle for MacOS, both links from above can be used and after python3 installation the original source code can be ran with the following command : <b>python3 TaiCO.py</b>
 +
        </p>
 +
        <p>
 +
             For further information regarding terms of use and how to use TaiCO properly you are strongly advised to inspect the README.txt file or contact the author by email: vrantos@hotmail.gr
 
         </p>
 
         </p>
 
          
 
          
 
         </div>
 
         </div>
  
        <div><a class="anchor" id="section-4"></a>
+
<!-- Reference section -->
        <h2 class="h2">References</h2>
+
<div id="ref_sec"><a class="anchor" id="references"></a>
            <p>
+
    <h2 class="h2">References</h2>
            </p>
+
    <ol>
        </div>
+
        <li>dos Reis, Mario, Renos Savva, and Lorenz Wernisch. "Solving the riddle of codon usage preferences: a test for translational selection." Nucleic acids research 32.17 (2004): 5036-5044.</li>
 
+
        <li><a href="http://www.pyinstaller.org/">PyInstaller Official Page</a></li>
        <div><a class="anchor" id="section-5"></a>
+
       
        <h2 class="h2"></h2>
+
    </ol>
            <p>
+
</div>
            </p>
+
        </div>
+
 
+
        <div><a class="anchor" id="section-6"></a>
+
        <h2 class="h2"></h2>
+
            <p>
+
            </p>
+
        </div>
+
 
+
        <div><a class="anchor" id="section-7"></a>
+
        <h2 class="h2"></h2>
+
            <p>
+
            </p>
+
        </div>
+
 
          
 
          
 
     </div> <!-- /LEFT -->
 
     </div> <!-- /LEFT -->
Line 243: Line 254:
 
             <li><a href="#section-2">Theory</a></li>
 
             <li><a href="#section-2">Theory</a></li>
 
             <li><a href="#section-3">TaiCO features</a></li>
 
             <li><a href="#section-3">TaiCO features</a></li>
             <li><a href="#section-4">References</a></li>
+
             <li><a href="#references">References</a></li>
            <li><a href="#section-5">Section 5</a></li>
+
            <li><a href="#section-6">Section 6</a></li>
+
            <li><a href="#section-7">Section 7</a></li>
+
 
         </ul>
 
         </ul>
 
     </div> <!-- /RIGHT -->
 
     </div> <!-- /RIGHT -->

Latest revision as of 02:46, 20 October 2016

New HTML template for the wiki




Bootstrap Example

CODON OPTIMIZATION SOFTWARE

The time for implementation of sophisticated computational approaches into biology has finally come. The DTU Biobuilders team was well aware of this fact and therefore took a leap forward by developing TaiCO, a unique specialized yet user-friendly codon optimization tool based on species specific tAI calculation.


Overview

"There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies."

C.A.R. Hoare

The increased use of non-conventional organisms for conventional purposes increases the need for codon optimization of coding sequences that are used for heterologous protein production. Codon optimization is typically performed by replacing each codon with the most frequently used synonymous from the host genome. The assumption that the most frequent synonymous codon is also the most efficiently translated codon is not necessarily true when it is considered that a typical or average transcript is not one that is likely to have a higher than average translation efficiency. Highly translated transcripts often contain “reserved” codons that are not the most common. Instead, these transcripts contain codons that best match the tRNA pools in the cell. These tRNA pools can be estimated by the number of tRNA genes that have an anticodon which corresponds to a given codon. This approach is known as the tRNA Adaptation Index (tAI)1 when used to assess the translation efficiency of a coding sequence.

From Blackboard Calculations to Software

TaiCO (tAI Codon Optimaztion tool) constitutes a unique computational tool for answering the common biological question: What coding DNA sequence will result in maximum protein expression? The DTU Biobuilders' proposal for solving this task is a stand-alone application that is based completely on species specific tAI calculation and is bundled with a simplistic Graphic User Interface (GUI) compatible with many platforms. We hope that this software will contribute to faster and easier to production of biotechnological results and become a high-end optimizing method with its unique theory implementation.

DESCRIPTION
The TaiCO Interface

Theory

As mentioned, the central issue in codon optimization is to determine which codons are most efficiently translated for each amino acid. The quantity needed for this task is called 'translatability' and is denoted \(W_i\) for the \(i\)'th codon.

To accomplish this, we have chosen to use a tRNA Adaptation Index-based method (tAI). The fundamental assumption behind this method is that highly expressed proteins have their genes encoded with a set of codons that is overall more susceptible to tRNA-binding and translation compared to proteins that are not highly expressed. Hence, this optimization method estimates the codon preferences in such a way that the correlation between protein level and tAI is maximized.

The formulas for calculating individual \(W_i\)'s were stated by dosReis1. All 64 \(W_i\)'s can be calculated in one matrix multiplication, by letting \(G\) be the 4\(\times\)16 matrix consisting of the tGCN's (in TaiCO referred to as 'gcn') and letting \(S\) be the 4\( \times\)4 matrix containing the (1 \(-s_{ij}\)) values. Hence,

$$W = SG$$

The computed \(W_i\)'s are then normalized by putting \(w_i = W_i/W_{\text{max}}\), and those normalized translatabilities, \(w_i\) do then form the basis for codon selection. Higher \(w_i\)-values are simply selected over lower values.

The \(G\) Matrix

\(G\) consists of 64 tGCN values, which are the gene copy number of tRNA's recognizing specific codons. Normally, available gcn-files list the tGCN's in terms of the reversed anticodon corresponding to the recognized codon, hence, the tricodons in the raw gcn-files are reversed and have their bases replaced by the complementary ones. For instance, in S. cerevisiae the gcn of tRNA's recognizing TTC (encoding glutamic acid) is 10, so in the raw file, this information is presented as the reversed anticodon, GAA, being equal to 10 instead. When converted into their encoding form, the tGCN's are put into the \(G\) matrix such that each column has the first two position fixed and each row has a fixed third position:

AAAACAAGAATACAACCACGACTAGAAGCAGGAGTATAATCATGATTA
AACACCAGCATCCACCCCCGCCTCGACGCCGGCGTCTACTCCTGCTTC
AAGACGAGGATGCAGCCGCGGCTGGAGGCGGGGGTGTAGTCGTGGTTG
AATACTAGTATTCATCCTCGTCTTGATGCTGGTGTTTATTCTTGTTTT

The \(S\) Matrix

While \(G\) is precisely known, \(S\) needs to be optimized. In dosReis 2004, the optimized \(s_{ij}\)-values for S. cerevisiae are published, yielding the \(S\)-matrix, $$ S = \begin{pmatrix} 1 & 0 & 0 & 0.0001 \\ 0 & 1 & 0 & 0.72 \\ 0.32 & 0 & 1 & 0 \\ 0 & 0.59 & 0 & 1 \end{pmatrix} $$ where both rows and columns are ordered as A,C,G,T. Thus, the \(W_i\)'s computed from the \(SG\) multiplication are each influenced by two tGCN's. As an example, calculating the translatability of CCG will be equal to the dot product of the third row of \(S\) (because the third position is a G), and the sixth row of \(G\) (because the first two positions are CC): $$ W_{CCG} = 0.32 \cdot \text{tGCN}_{CCA} + 1 \cdot \text{tGCN}_{CCG} $$ clearly taking the wobbling potential of G to A in the third position into account.

TaiCO Features

Our proposal for reliable and fast computational production of optimized DNA sequences comes under the name TaiCO. The need for a specialized software tool for optimization of Y. lipolytica DNA sequences became evident when the product subgroup of DTU Biobuilders started to design constructs for protein expression. TaiCO allowed the rapid analysis of the coding sequences of interest and due to the final simplistic architecture and low resources demands it was decided to extend its capabilities for every organism with tGCN files available.

Software Overview

TaiCO is implemented in Python3, a high-level programming languages with a built-in library for GUI development called tkinter. The algorithm was structured in an easily modifiable layout, due to its static philosophy with the exclusive usage of only built-in libraries and modules in addition to the already known and commonly used "Pythonic" data structures (e.g dictionaries,lists). This software comes with the Open Software license GPL v3.

For a more descriptive view on how the algorithm was implemented, links for the available versions are provided in a further section.

Input Files and Result

The first input file requested from TaiCO is a GCN table in simple text format. Although the software comes bundled with 7 GCN files from model organisms, other GCN tables can be uploaded. The second input file that the user has to provide is a list with a single or even multiple protein sequences in FASTA format. The final input file that the user can provide is the powerful capability of parsing a simple text file including the sequences of the restriction sites that have to be absent from the optimized DNA resulting sequences. The output of the analysis is a file saved in a FASTA format that contains all the optimized DNA sequences.

DESCRIPTION1
Step 1: Double click on executable named "TaiCO"
DESCRIPTION2
Step 2: Click on search file and select the file from your system (repeat for all "Search file" options)
DESCRIPTION3
Step 3: Click "Start analysis" button and wait until the successful message is poped-up

Core Algorithmic Approach, Runtime and Distribution

The completed script was “converted” into an executable file along with all included modules using the PyInstaller2 software. This allowed us to make TaiCO available for almost all Unix-based and Windows platforms. Due to the nature of the supporting PyInstaller software the user has only one mandatory computational task in order to be able to run the software, which is to download the preferred zipped version. The application runtime remains under 5 sec (system dependent) due to the computationally efficient production of optimized sequences codon by codon according to the corresponding maximum \(w_i\) value. In the case of restriction site elimination the selection of codons is performed along with the sequence production and less optimized codons are therefore incorporated. After the downloading procedure is done, the careful reading of the two README files in the relevant folder is strongly recommended. The system compatible version of TaiCO can be downloaded by clicking one of the following links:

Windows: Click here to download Windows version

Unix: Click here to download Unix version

Mac OS X: There is no specific bundle for MacOS, both links from above can be used and after python3 installation the original source code can be ran with the following command : python3 TaiCO.py

For further information regarding terms of use and how to use TaiCO properly you are strongly advised to inspect the README.txt file or contact the author by email: vrantos@hotmail.gr

References

  1. dos Reis, Mario, Renos Savva, and Lorenz Wernisch. "Solving the riddle of codon usage preferences: a test for translational selection." Nucleic acids research 32.17 (2004): 5036-5044.
  2. PyInstaller Official Page

  • FIND US AT:
Facebook Twitter
  • DTU BIOBUILDERS
  • DENMARK
  • DTU - SØLTOFTS PLADS, BYGN. 221/002
  • 2800 KGS. LYNGBY

  • E-mail:
  • dtu-biobuilders-2016@googlegroups.com
  • MAIN SPONSORS:
Lundbeck fundation DTU blue dot Lundbeck fundation Lundbeck fundation