Team:DTU-Denmark/Software

New HTML template for the wiki




Bootstrap Example

CODON OPTIMIZATION SOFTWARE

The time for implementation of sophisticated computational approaches into biology has finally come. The DTU Biobuilders team was well aware of this fact and therefore took a leap forward by developing TaiCO, a specialized yet user-friendly codon optimization tool based on tAI calculation


Overview

Quote Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer posuere erat a ante.

Someone famous in Source Title

Background

The increased use of non-conventional organisms for conventional purposes increases the need for codon optimization of coding sequences that are used for heterologous protein production. Codon optimization is typically performed by replacing each codon with the most frequently used synonymous from the host genome. The assumption that the most frequent synonymous codon is also the most efficiently translated codon is not necessarily true when it is considered that a typical or average transcript is not one that is likely to have a higher than average translation efficiency. Highly translated transcripts often contain “reserved” codons that are not the most common. Instead, these transcripts contain codons that best match the tRNA pools in the cell. These tRNA pools can be estimated by the number of tRNA genes that have an anticodon which corresponds to a given codon. This approach is known as the tRNA Adaptation Index (tAI) (dos Reis et al. 2004)1 when used to assess the translation efficiency of a coding sequence.

From Blackboard Calculations to Software

TaiCO (tAI Codon Optimaztion tool) constitutes a unique computational tool for answering specific biological questions, a statement easily justified by the fact that it is the only stand-alone application in the world (to the best of the team's knowledge) with an implemented Graphic User Interface (GUI) that is based completely on species specific tAI calculation. We hope that this software will contribute to faster and easier to produce biotechnological results and become a high-end optimizing method with its unique theory implementation.

DESCRIPTION
The TaiCO Interface

Theory

As mentioned, the central issue in codon optimization is to determine which codons are most efficiently translated for each amino acid. The quantity needed for this task is called 'translatability' and is denoted \(W_i\) for the \(i\)'th codon.

To accomplish this, we have chosen to use a tRNA Adaptation Index-based method (tAI). The fundamental assumption behind this method is that highly expressed proteins have their genes encoded with a set of codons that is overall more susceptible to tRNA-binding and translation compared to proteins that are not highly expressed. Hence, this optimization method estimates the codon preferences in such a way that the correlation between protein level and tAI is maximized.

The formulas for calculating individual \(W_i\)'s were stated by dosReis1. All 64 \(W_i\)'s can be calculated in one matrix multiplication, by letting \(G\) be the 4\(\times\)16 matrix consisting of the tGCN's (in TaiCO referred to as 'gcn') and letting \(S\) be the 4\( \times\)4 matrix containing the (1 \(-s_{ij}\)) values. Hence,

$$W = SG$$

The computed \(W_i\)'s are then normalized by putting \(w_i = W_i/W_{\text{max}}\), and those normalized translatabilities, \(w_i\) do then form the basis for codon selection. Higher \(w_i\)-values are simply selected over lower values.

The \(G\) Matrix

\(G\) consists of 64 tGCN values, which are the gene copy number of tRNA's recognizing specific codons. Normally, available gcn-files list the tGCN's in terms of the reversed anticodon corresponding to the recognized codon, hence, the tricodons in the raw gcn-files are reversed and have their bases replaced by the complementary ones. For instance, in S. cerevisiae the gcn of tRNA's recognizing TTC (encoding glutamic acid) is 10, so in the raw file, this information is presented as the reversed anticodon, GAA, being equal to 10 instead. When converted into their encoding form, the tGCN's are put into the \(G\) matrix such that each column has the first two position fixed and each row has a fixed third position:

AAAACAAGAATACAACCACGACTAGAAGCAGGAGTATAATCATGATTA
AACACCAGCATCCACCCCCGCCTCGACGCCGGCGTCTACTCCTGCTTC
AAGACGAGGATGCAGCCGCGGCTGGAGGCGGGGGTGTAGTCGTGGTTG
AATACTAGTATTCATCCTCGTCTTGATGCTGGTGTTTATTCTTGTTTT

The \(S\) Matrix

While \(G\) is precisely known, \(S\) needs to be optimized. In dosReis 2004, the optimized \(s_{ij}\)-values for S. cerevisiae are published, yielding the \(S\)-matrix, $$ S = \begin{pmatrix} 1 & 0 & 0 & 0.0001 \\ 0 & 1 & 0 & 0.72 \\ 0.32 & 0 & 1 & 0 \\ 0 & 0.59 & 0 & 1 \end{pmatrix} $$ where both rows and columns are ordered as A,C,G,T. Thus, the \(W_i\)'s computed from the \(SG\) multiplication are each influenced by two tGCN's. As an example, calculating the translatability of CCG will be equal to the dot product of the third row of \(S\) (because the third position is a G), and the sixth row of \(G\) (because the first two positions are CC): $$ W_{CCG} = 0.32 \cdot \text{tGCN}_{CCA} + 1 \cdot \text{tGCN}_{CCG} $$ clearly taking the wobbling potential of G to A in the third position into account.

TaiCO Features

Our proposal for reliable and fast computational production of optimized DNA sequences comes under the name TaiCO. The need for a specialized software tool for optimization of Y. lipolytica DNA sequences became evident when the our products group started to design constructs for protein expression. TaiCO allowed the team's biotechnologists to perform extended analysis/results of the coding sequences of interest due to its simplistic architecture and low resources demands.

Software Overview

TaiCO is implemented in Python3. By inspecting the source code it becomes evident that the algorithm was implemented in an easily modifiable layout, due to its static philosophy with the exclusive usage of only built-in libraries and modules in addition to the already known and commonly used "Pythonic" data structures. This software comes with the Open Software license GPL v3. For a more descriptive view on how the algorithm was implemented, it is heavily encouraged to inspect the source code along with the README.txt file deposited in the iGEM SOFTWARE GitHub repository.

Input Files and Result

The first input file requested from TaiCO is a GCN table in simple text format. Although the software comes bundled with 7 GCN files from model organisms, other GCN tables can be uploaded. The second input file that the user has to provide is a list with a single protein sequence in FASTA format. In addition, multiple protein sequences can be uploaded simultaneously. The final input file that the user can provide, although it is considered optional but a very powerful capability, is a simple text file including the sequences of the restriction sites that have to be absent from the optimized DNA resulting sequences. The output of the analysis is a file saved in a FASTA format that contains all the optimized DNA sequences.

DESCRIPTION1
Step 1: Double click on executable named "TaiCO" in "dist" folder
DESCRIPTION2
Step 2: Click on search file and select the file from your system (repeat for all "Search file" options)
DESCRIPTION3
Step 3: Click "Start analysis" button and wait until the successful message

Compatibility, Runtime and Distribution

The full script was “converted” into an executable file along with all included modules using the PyInstaller (2) software. This allowed us to make TaiCO available for all the “mainstream” platforms (Unix based systems, Windows, MAC OS). Due to the nature of the supporting PyInstaller software the user has only one mandatory computational task in order to be able to run the software, which is to download the preferred zipped version which is stored to the iGEM’s software repository on GitHub and contains all the essential files for proper usage of TaiCO. The relevant operating system version of TaiCO can be downloaded by clicking one of the following links: Windows: Unix: MacOS: For further information regarding terms of use and how to use TaiCO properly you are strongly advised to inspect the README.txt file or contact the author by email: vrantos@hotmail.gr

References

  1. dos Reis, Mario, Renos Savva, and Lorenz Wernisch. "Solving the riddle of codon usage preferences: a test for translational selection." Nucleic acids research 32.17 (2004): 5036-5044.

  • FIND US AT:
Facebook Twitter
  • DTU BIOBUILDERS
  • DENMARK
  • DTU - SØLTOFTS PLADS, BYGN. 221/002
  • 2800 KGS. LYNGBY

  • E-mail:
  • dtu-biobuilders-2016@googlegroups.com
  • MAIN SPONSORS:
Lundbeck fundation DTU blue dot Lundbeck fundation Lundbeck fundation