New HTML template for the wiki

Bootstrap Example

CODON OPTIMIZATION SOFTWARE
The time for implementation of sophisticated computational approaches into biology has finally come. The DTU Biobuilders team was well aware of this fact and therefore took a leap forward by developing TaiCO, a specialized yet user-friendly codon optimization tool based on tAI calculation

Biology has at least 50 more interesting years.
James D. Watson

Overview

Quote Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer posuere erat a ante.
Someone famous in Source Title

Background

The increased use of non-conventional organisms for conventional purposes increases the need for codon optimization of coding sequences that are used for heterologous protein production. Codon optimization is typically performed by replacing each codon with the most frequently used synonymous from the host genome. The assumption that the most frequent synonymous codon is also the most efficiently translated codon is not necessarily true when it is considered that a typical or average transcript is not one that is likely to have a higher than average translation efficiency. Highly translated transcripts often contain “reserved” codons that are not the most common. Instead, these transcripts contain codons that best match the tRNA pools in the cell. These tRNA pools can be estimated by the number of tRNA genes that have an anticodon which corresponds to a given codon. This approach is known as the tRNA Adaptation Index (tAI) (dos Reis et al. 2004)¹ when used to assess the translation efficiency of a coding sequence.

From Blackboard Calculations to Software

TaiCO (tAI Codon Optimaztion tool) constitutes a unique computational tool for answering specific biological questions, a statement easily justified by the fact that it is the only stand-alone application in the world (to the best of the team's knowledge) with an implemented Graphic User Interface (GUI) that is based completely on species specific tAI calculation. We hope that this software will contribute to faster and easier to produce biotechnological results and become a high-end optimizing method with its unique theory implementation.

Theory

As mentioned, the central issue in codon optimization is to determine which codons are most efficiently translated for each amino acid. The quantity needed for this task is called 'translatability' and is denoted $W_i$ for the $i$'th codon.

To accomplish this, we have chosen to use a tRNA Adaptation Index-based method (tAI). The fundamental assumption behind this method is that highly expressed proteins have their genes encoded with a set of codons that is overall more susceptible to tRNA-binding and translation compared to proteins that are not highly expressed. Hence, this optimization method estimates the codon preferences in such a way that the correlation between protein level and tAI is maximized.

The formulas for calculating individual $W_i$'s were stated by dosReis¹. All 64 $W_i$'s can be calculated in one matrix multiplication, by letting $G$ be the 4$\times$16 matrix consisting of the tGCN's (in TaiCO referred to as 'gcn') and letting $S$ be the 4$ \times$4 matrix containing the (1 $-s_{ij}$) values. Hence,

$$W = SG$$

The computed $W_i$'s are then normalized by putting $w_i = W_i/W_{\text{max}}$, and those normalized translatabilities, $w_i$ do then form the basis for codon selection. Higher $w_i$-values are simply selected over lower values.

The $G$ Matrix

$G$ consists of 64 tGCN values, which are the gene copy number of tRNA's recognizing specific codons. Normally, available gcn-files list the tGCN's in terms of the reversed anticodon corresponding to the recognized codon, hence, the tricodons in the raw gcn-files are reversed and have their bases replaced by the complementary ones. For instance, in S. cerevisiae the gcn of tRNA's recognizing TTC (encoding glutamic acid) is 10, so in the raw file, this information is presented as the reversed anticodon, GAA, being equal to 10 instead. When converted into their encoding form, the tGCN's are put into the $G$ matrix such that each column has the first two position fixed and each row has a fixed third position:

AAA

ACA

AGA

ATA

CAA

CCA

CGA

CTA

GAA

GCA

GGA

GTA

TAA

TCA

TGA

TTA

AAC

ACC

AGC

ATC

CAC

CCC

CGC

CTC

GAC

GCC

GGC

GTC

TAC

TCC

TGC

TTC

AAG

ACG

AGG

ATG

CAG

CCG

CGG

CTG

GAG

GCG

GGG

GTG

TAG

TCG

TGG

TTG

AAT

ACT

AGT

ATT

CAT

CCT

CGT

CTT

GAT

GCT

GGT

GTT

TAT

TCT

TGT

TTT

The $S$ Matrix

While $G$ is precisely known, $S$ needs to be optimized. In dosReis 2004, the optimized $s_{ij}$-values for S. cerevisiae are published, yielding the $S$-matrix, $$ S = \begin{pmatrix} 1 & 0 & 0 & 0.0001 \\ 0 & 1 & 0 & 0.72 \\ 0.32 & 0 & 1 & 0 \\ 0 & 0.59 & 0 & 1 \end{pmatrix} $$ where both rows and columns are ordered as A,C,G,T. Thus, the $W_i$'s computed from the $SG$ multiplication are each influenced by two tGCN's. As an example, calculating the translatability of CCG will be equal to the dot product of the third row of $S$ (because the third position is a G), and the sixth row of $G$ (because the first two positions are CC): $$ W_{CCG} = 0.32 \cdot \text{tGCN}_{CCA} + 1 \cdot \text{tGCN}_{CCG} $$ clearly taking the wobbling potential of G to A in the third position into account.

TaiCO Features

Our proposal for reliable and fast computational production of optimized DNA sequences comes under the name TaiCO. The need for a specialized software tool for optimization of Y. lipolytica DNA sequences became evident when the our products group started to design constructs for protein expression. TaiCO allowed the team's biotechnologists to perform extended analysis/results of the coding sequences of interest due to its simplistic architecture and low resources demands.

Software Overview

TaiCO is implemented in Python3. By inspecting the source code it becomes evident that the algorithm was implemented in an easily modifiable layout, due to its static philosophy with the exclusive usage of only built-in libraries and modules in addition to the already known and commonly used "Pythonic" data structures. This software comes with the Open Software license GPL v3. For a more descriptive view on how the algorithm was implemented, it is heavily encouraged to inspect the source code along with the README.txt file deposited in the iGEM SOFTWARE GitHub repository.

Input Files and Result

The first input file requested from TaiCO is a GCN table in simple text format. Although the software comes bundled with 7 GCN files from model organisms, other GCN tables can be uploaded. The second input file that the user has to provide is a list with a single protein sequence in FASTA format. In addition, multiple protein sequences can be uploaded simultaneously. The final input file that the user can provide, although it is considered optional but a very powerful capability, is a simple text file including the sequences of the restriction sites that have to be absent from the optimized DNA resulting sequences. The output of the analysis is a file saved in a FASTA format that contains all the optimized DNA sequences.

Optimization - 3 Steps away from your lab (tutorial)

DESCRIPTION1 — Step 1: Double click on executable named "TaiCO" in "dist" folder

DESCRIPTION2 — Step 2: Click on search file and select the file from your system (repeat for all "Search file" options)

DESCRIPTION3 — Step 3: Click "Start analysis" button and wait until the successful message

Compatibility, Runtime and Distribution

The full script was “converted” into an executable file along with all included modules using the PyInstaller (2) software. This allowed us to make TaiCO available for all the “mainstream” platforms (Unix based systems, Windows, MAC OS). Due to the nature of the supporting PyInstaller software the user has only one mandatory computational task in order to be able to run the software, which is to download the preferred zipped version which is stored to the iGEM’s software repository on GitHub and contains all the essential files for proper usage of TaiCO. The relevant operating system version of TaiCO can be downloaded by clicking one of the following links: Windows: Unix: MacOS: For further information regarding terms of use and how to use TaiCO properly you are strongly advised to inspect the README.txt file or contact the author by email: vrantos@hotmail.gr

References

dos Reis, Mario, Renos Savva, and Lorenz Wernisch. "Solving the riddle of codon usage preferences: a test for translational selection." Nucleic acids research 32.17 (2004): 5036-5044.

Team:DTU-Denmark/Software