Difference between revisions of "Team:IIT-Madras/Model"

 
(286 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<html>
+
{{IIT-Madras-Top/CSS}}
<head>
+
__TOC__
  
<style type="text/css">
 
/* Start Removing iGEM Style*/ .mw-content-ltr ul, .mw-content-rtl .mw-content-ltr ul{ margin:0px; } #HQ_page p{ font-size: 20px; } #HQ_page .clear{ height:0px; } .clear { clear: both; height:0px; } #sideMenu{ display:none; } #top_title{ display:none; } #content { width: 100%; margin:0px auto; padding:0px; border:none; background:none; } #globalWrapper { width: 100%; padding:0px; margin-top: -25px; } .firstHeading { display:none; } #bodyContent h1, #bodyContent h2{ margin: 0; } /* End Removing iGEM Style */
 
  
 +
= Modularity of RBS parts =
  
@font-face{
+
== Introduction ==
src: url(https://static.igem.org/mediawiki/2016/a/aa/Iitm_lb.otf);
+
The '''non-modular nature''' of Ribosomal Binding Sites in bacteria is well known to the synthetic biology community. Most of the biological parts have been assigned a strength for their functionality. For example, Promoters have transcriptional score (number of RNA molecules per DNA molecule), RBSs have translational score (number of protein molecules per RNA molecule). Ideally, if we were to use a promoter and a RBS to produce a protein, we should get ('''transcriptional score times translational score''') number of protein molecules. In bacterial cells, transcription and translation processes are coupled and can occur simultaneously. This is why, these process are not independent and, hence, not modular. Apart from this, RNA molecules consisting of RBS and first few codons may form secondary structures, which in turn reduce the translation efficiency. Also, rare codons have also been shown to influence translation efficiency.
font-family: L;
+
}
+
@font-face{
+
src: url(https://static.igem.org/mediawiki/2016/5/59/Iitm_gi.otf);
+
font-family: C;
+
}
+
body{
+
background: #050505;
+
color: white;
+
font-family: C;
+
text-align: center;
+
padding: 0px;
+
                        margin: 0px;
+
}
+
ul{
+
list-style-type: none;
+
padding: 5px;
+
margin: 0px;
+
}
+
li.nav{
+
display: inline-block;
+
font-size: 15px;
+
padding: 8px 16px 8px 16px;
+
border-left: 5px solid #ffffff;
+
background: #181818;
+
border-radius: 1px;
+
color: #ffffff;
+
margin: 10px;
+
font-weight: 200;
+
}
+
li.nav:hover{
+
background: #282828;
+
}
+
a,a:visited{
+
text-decoration: none;
+
font-family: C;
+
color: inherit;
+
}
+
a.nav{
+
display: inline-block;
+
min-width: 200px;
+
text-align: left;
+
                        text-decoration: none;
+
}
+
                img.nav{
+
                      vertical-align: 0px;
+
                      margin-right: 5px;
+
                      filter: brightness(5000%);
+
                }
+
a.nav:hover{
+
cursor: pointer;
+
cursor: hand;
+
}
+
a.btn{
+
background: #000f0d;
+
color: #22ffdd;
+
border-radius: 3px;
+
padding: 8px 16px 8px 16px;
+
font-size: 15px;
+
letter-spacing: 3px;
+
font-family: C;
+
font-weight: 200;
+
                        text-decoration: none;
+
}
+
a.btn:hover{
+
background: #00332b;
+
}
+
                h1,h2{
+
                    font-family: L;
+
                    font-size: 50px;
+
                    letter-spacing: 6px;
+
                    color: black;
+
                    padding-bottom: 40px;
+
                }
+
                span.hd3{
+
                    font-family: L;
+
                    font-size: 20px;
+
                    letter-spacing: 2px;
+
                    padding: 0px;
+
                    margin: 0px;
+
                    display: block;
+
                    text-align: left;
+
                }
+
                h2{
+
                    font-size: 30px;
+
                    padding-bottom: 10px;
+
                }
+
                p{
+
                  display: inline-block;
+
                  font-family: C;
+
                  font-size: 15px;
+
                  max-width: 75%;
+
                  padding: 20px;
+
                  color: black;
+
                  text-align: justify;
+
                  margin: 0px;
+
                }
+
</style>
+
+
<script type="text/javascript">
+
function gid(a){
+
return document.getElementById(a);
+
}
+
function setup(){
+
gid('welcome').style.paddingTop = gid('header').offsetHeight + 'px';
+
}
+
function putmenu(){
+
gid('menu').style.display = 'block';
+
gid('lines').style.display = 'none';
+
gid('cross').style.display = 'inline-block';
+
}
+
function cutmenu(){
+
gid('menu').style.display = 'none';
+
gid('lines').style.display = 'inline-block';
+
gid('cross').style.display = 'none';
+
}
+
</script>
+
</head>
+
<body onload="setup()" onresize="setup()">
+
<div id="header" style="display: block; width: 100%; text-align: left; background: #050505; position: fixed; top: 0px; padding-top: 15px; box-shadow: 0px 2px 4px rgba(5,5,5,0.5); z-index: 20;">
+
<ul style="color: #22ffdd; font-family: C; font-weight: 200; letter-spacing: 2px;">
+
<li class="nav" id="lines" style="border: none; max-width: 120px; background: #050505;"> <a class="nav" onclick="putmenu()">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/7/7c/Iitm_menu.png" height="15" width="15"> MENU
+
</a> </li>
+
<li class="nav" id="cross" style="display:none; border: none; max-width: 120px; background: #050505;"> <a class="nav" onclick="cutmenu()">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/3/37/Iitm_close.png" height="15" width="15"> CLOSE
+
</a> </li>
+
</ul>
+
<ul id="menu" style="display: none; padding-bottom: 30px;">
+
<li class="nav"> <a class="nav" href="https://2016.igem.org/Team:IIT-Madras">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/b/b4/Iitm_home.png" height="15" width="15"> HOME
+
</a> </li>
+
<li class="nav"> <a class="nav" href="https://2016.igem.org/Team:IIT-Madras/Team">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/d/db/Iitm_team.png" height="15" width="15"> TEAM
+
</a> </li>
+
<li class="nav"> <a class="nav" href="https://2016.igem.org/Team:IIT-Madras/Description">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/6/6e/Iitm_project.png" height="15" width="15"> PROJECT
+
</a> </li>
+
<li class="nav"> <a class="nav" href="https://2016.igem.org/Team:IIT-Madras/Parts">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/f/f8/Iitm_parts.png" height="15" width="15"> PARTS
+
</a> </li>
+
                        <li class="nav"> <a class="nav" href="https://2016.igem.org/Team:IIT-Madras/Measurement">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/f/fd/Iitm_measurements.png" height="15" width="15"> MEASUREMENTS
+
</a> </li>
+
                        <li class="nav"> <a class="nav" href="https://2016.igem.org/Team:IIT-Madras/Model">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/4/45/Iitm_model.png" height="15" width="15"> MODEL
+
</a> </li>
+
<li class="nav"> <a class="nav" href="https://2016.igem.org/Team:IIT-Madras/Safety">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/d/d0/Iitm_safety.png" height="15" width="15"> SAFETY
+
</a> </li>
+
<li class="nav"> <a class="nav" href="https://2016.igem.org/Team:IIT-Madras/Attributions">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/3/36/Iitm_attributions.png" height="15" width="15"> ATTRIBUTIONS
+
</a> </li>
+
<li class="nav"> <a class="nav" href="https://2016.igem.org/Team:IIT-Madras/Human_Practices">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/9/94/Iitm_humanpractices.png" height="15" width="15"> HUMAN PRACTICES
+
</a> </li>
+
                        <li class="nav"> <a class="nav" href="https://2016.igem.org/Team:IIT-Madras/Interlabstudy">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/b/b8/Iitm_interlab.png" height="15" width="15"> INTERLAB STUDY
+
</a> </li>
+
<li class="nav"> <a class="nav" href="https://2016.igem.org/Team:IIT-Madras/Entrepreneurship">
+
<img class="nav" src="https://static.igem.org/mediawiki/2016/6/6f/Iitm_awards.png" height="15" width="15"> AWARDS
+
</a> </li>
+
+
</ul>
+
</div>
+
+
+
  
 +
Secondary structure's and codons' influence on translational score of RBS are overlapping as they are a result of '''A, U, G, C''' combinations. Therefore, it is important to decouple both effects to unravel underlying patterns.
  
 +
'''We have successfully validated an empirical model to predict variations in protein expression levels, which could help future iGEM teams'''. Here, we represent a thorough description of our work.
  
 +
== Methodology ==
  
 +
The dataset from "Causes and effects of N-terminal codon bias in bacterial genes" paper was taken. Protein expression Data was available for following constructs :
 +
2 promoters x 3 RBSs x 1781 (137x13) sfGFP variants in first 11 codons at N-terminal (3 RBS parts were B0034, B0032, B0030) and 2 promoter x 137 natural RBSs x 13 sfGFP variants in first 11 codons at N-terminal (2 Promoters were J23100 & J23108)
  
+
== Hypothesis and Algorithm ==
<div id="content" style="display: block; background: #e0e0e0; margin: 0px; text-align: center; padding-top: 100px;">
+
At the beginning, we hypothesized following things based on the information available in literature:<br>
<h1 style="font-family: L;"> MODEL </h1>
+
1. '''Expression is inversely proportional to the stability of secondary structure of mRNA near RBS part'''.<br>
 +
2. '''Rare codons present in first 11 codons of proteins have the ability to increase or decrease the translational score of RBS parts'''.<br>
 +
3. '''Each RBS part has a native strength irrespective of the promoter and protein coding part it can be used with'''.
  
<h2>Modularity of Ribosomal Binding Sites</h2>
+
We designed an algorithm to compute the translational score of a given construct(which expresses a protein) in the following way:
 +
\begin{equation*}
 +
TS = \dfrac{NS*C_{pref}}{1+dG} + \alpha
 +
\end{equation*}
  
 +
\begin{equation*}
 +
C_{pref}= C_{1}*C_{2}*C_{3}*...*C_{11}*C_{sfGFP}
 +
\end{equation*}
  
<p style="font-size: 20px; text-align: justify; font-family: C; background: #00ad93;">
+
'''Objective Function''': minimize \(\sum \mid TS_{model}-TS_{experiment} \mid\)
<span class="hd3">Key Achievements</span>
+
1. Validated an algorithm to predict variations in the protein expression.
+
<br>
+
2. Obtained a codon preference matrix for first 11 codons in protein coding parts.
+
<br>
+
3. Relative global native strength of B0032, B0030, B0034 RBS parts.
+
<br>
+
4. Quantification of global modularity of above mentioned RBS parts w.r.t protein coding parts and promoter parts.
+
</p>
+
  
<p style="font-size: 20px; text-align: justify; font-family: C;">
+
'''Outlier Removal''': top scores in \(\mid TS_{model}-TS_{experiment} \mid\)
<span class="hd3">Methodology</span>
+
The dataset from "Causes and effects of N-terminal codon bias in bacterial genes" paper was taken. Protein expression Data was available for following constructs :
+
2 promoters x 3 RBSs x 1781 (137x13) sfGFP variants in first 11 codons at N-terminal (3 RBS parts were B0034, B0032, B0030)
+
And 2 promoter x 137 natural RBSs x 13 sfGFP variants in first 11 codons at N-terminal (2 Promoters were J23100 & J23108)
+
</p>
+
  
<p style="font-size: 20px; text-align: justify; font-family: C;">
+
where \(C_{i}\) is the codon preference of codon present at \(i^{th}\) position, \(C_{sfGFP}\) a constant for sfGFP protein codons, <br>
<span class="hd3">Hypothesis and Algorithm</span>
+
'''TS''' : Translational Score of RBS part, <br>
At the beginning, we hypothesized following things based on the information available in literature:
+
'''NS''' : Native Strength of RBS part, which is equal to TS at \(\delta\)G=0, \(C_{pref}\)=1; <br>
<br>
+
'''\(\delta\)G''' : Stability of RNA strand consisting of RBS and first 11 codon of protein, <br>
Expression is inversely proportional to the stability of secondary structure of mRNA near RBS part.
+
\(\alpha\) is a constant.
Rare codons present in first 11 codons of proteins have the ability to increase or decrease the translational score of RBS parts.
+
Each RBS part has a native strength irrespective of the promoter and protein coding part it can be used with.
+
<br><br>
+
We designed an algorithm to compute the translational score of a given protein expressing construct in following way:
+
<br>
+
1. Input data included
+
Native strength of Promoter-RBS as PiRi, where RBS (Ri) is associated with Promoter (Pi) in the construct
+
DeltaG value for the RBS part and first 11 codons of protein coding part.
+
<br>
+
Codon matrix for first 11 codon of protein coding parts.
+
<br>
+
2. Output Data was the translational score for every construct.
+
<br>
+
a. Compute Cpref value for each construct
+
Each codon out of 64 codons in the first 11 codons of protein coding parts has been assigned a value, which refer to the increase/decrease in the translational score. (1+ is positive, 1 is neutral, 1- is negative)
+
Cpref= C1 *C2 *...*C10 *C11 *CsfGFP, where Ci is the preference of the codon present at i th position and CsfGFP , combined score for the rest of codons in protein variants i.e. sfGFP
+
<br>
+
b. Translational score =[(PiRi)*Cpref(i)/(1 + deltaG(i))] + α1, where α1 is a constant for all constructs.
+
<br>
+
c. Alternatively, to show the significance of rare codons we changed the Cpref value to 1 and optimized the system in same fashion.
+
</p>
+
  
 +
== Quantification of Non-modularity ==
 +
As previously mentioned, RBS's have been found to be non-modular with respect to promoter and protein coding parts. A quantification of modularity would enable us to screen better RBS parts to make high order complex genetic circuits in high-throughput manner.
  
<p style="font-size: 20px; text-align: justify; font-family: C;">
+
\begin{equation*}
<span class="hd3">Optimization</span>  
+
NM_p = \frac{\sigma_{NS}}{\langle {NS} \rangle} \\
Above model was optimized to compute the unknown variables, PiRi, codon matrix values, using the data from above mentioned paper. In MATLAB, fmincon function was used to minimize the sum of (model-experimental)^2 for all 14137 constrcuts.
+
NM_c = \frac{\sigma_{TS}}{\langle {TS} \rangle}
Further, the system was optimized to by removing 5%, 10% outliers, which were computed as the top scores in abs(model-experimental).
+
\end{equation*}
</p>
+
where, '''NM<sub>p</sub>''' is non-modularity of RBS w.r.t. promoters; <br>
 +
'''NM<sub>c</sub>''' is non-modularity of RBS w.r.t. protein coding part; <br> '''NS''' is the array of native strength of RBS part with given promoters, <br> '''TS''' is the array of translational score of RBS part with given protein coding parts.
  
 +
== Optimization ==
 +
Above model was optimized to compute the unknown variables, '''Native Strength of RBSs''' and '''codon preference matrix''', using the data from above mentioned paper. In MATLAB, <i>fmincon</i> function was used to minimize the sum of (\({TS_{model}-TS_{experiment})}^2\) for all 14137 constructs. After several iterations of optimization, we achieved following results. Optimization was done in MATLAB on a supercomputer facility at IIT Madras.
 +
Further, the system was optimized by removing 5%, 10% outliers, which were computed as the top scores in (\({TS_{model}-TS_{experiment})}^2\).
  
<p style="font-size: 20px; text-align: justify; font-family: C;">
+
<gallery mode=nolines class=center widths=400px heights=230px caption="Validation of Model">
<span class="hd3">Results</span>  
+
File:Igemiitm_exp_model2.png|Experiment model, dark green 90% data, dark green+light dark green 95% data.
After several iterations of optimization, we achieved following results
+
File:Igemiitm_exp_model1.png|Null model, dark green 90% data, dark green+light dark green 95% data.
Optimization was done in MATLAB on a supercomputer facility at IIT Madras.
+
</gallery>
<br><br>
+
Relative native strength:
+
<br>
+
B0034: 0.6163, B0032: 1.0 , B0030: 0.5473 averaged over with J23100 and J23108 promoters.
+
<br><br>
+
Global modularity of RBSs w.r.t promoters : (defined as std(score)/mean(score))
+
<br>
+
B0034: 0.3372, B0032:  0.2974, B0030: 1.1370
+
<br><br>
+
Global modularity of RBSs w.r.t protein coding parts : (defined as mean(std(score)/mean(score)) for different promoters)
+
<br>
+
B0034: 0.7815, B0032: 0.7127, B0030: 0.9958
+
</p>
+
  
  
<p style="font-size: 20px; text-align: justify; font-family: C;">
+
== Results ==
<span class="hd3">Conclusion</span>  
+
<gallery class=center widths=400px heights=230px>
We could achieve a heuristic solution with a correlation of 0.87 with 90% of the data points. Model gives us the strength of promoter-RBS combined strength for 280 (2 promoters x 140 RBSs) combiations. It also gave us the codon preference matrix for 64 codons, (shown below).<br>
+
File:Rbs_strength.png|Non-modularity of RBSs with Promoters
</p>
+
File:Nonmodularity_of_B343230.png|Popular BioBrick RBS parts
 +
</gallery>
  
<p style="font-size: 20px; text-align: justify; font-family: C;">
+
We found that some codons favor the translation process, while some of this do not. Following is a list of codons, which favor the translation process:
<img src="https://static.igem.org/mediawiki/2016/f/fc/Iitmadras_model2_log2express.jpeg" width="800" height="470">
+
<br>
+
We can observe a significant decrease (on average xx%) in the strength of RBSs (xx out of 140), when they are used with high strength promoters.
+
</p>
+
  
  
 +
{| class="wikitable" style="font-size:0.8em;@media screen and (max-width: 1000px){font-size:0.6em;}"
 +
! scope="row"| Codon
 +
|AAA||AAT||AGA||AGC||AGT||ATA||GAT||GGC||GGG||GGT||GTA||TCA||TCC||TCT||TGC
 +
|-
 +
! scope="row"| Amino Acid
 +
|K||N||R||S||S||I||D||G||G||G||V||S||S||S||C
 +
|-
 +
! scope="row"| Preference Value
 +
|1.17||1.23||1.15||1.13||1.21||1.14||1.13||1.19||1.19||1.21||1.14||1.14||1.13||1.19||1.19
 +
|-
 +
|}
  
 +
Following is a list of codons, which reduces the translation efficiency:
 +
{| class="wikitable" style="font-size:0.8em;@media screen and (max-width: 1000px){font-size:0.6em;}"
 +
! scope="row"| Codon
 +
|CAC||CGC||CTC||GTC||TTC
 +
|-
 +
! scope="row"| Amino Acid
 +
|H||R||L||V||F
 +
|-
 +
! scope="row"| Preference Value
 +
|0.9768||0.9862||0.979||0.997||0.993
 +
|-
 +
|}
  
 +
A complete list of codons and their preference values are [https://static.igem.org/mediawiki/2016/7/71/Igemiitm16_Codonpref.xlsx here]
  
 
</div>
 
  
 +
== Conclusion ==
 +
We could achieve a heuristic solution with a '''correlation of 0.87 with 90% of the data points''' for our experimental model, while null model gives us a correlation of 0.83 with 90% data points. Null model doesn't incorporate the feature of codon preference (C<sub>pref</sub>). As we see that codon preference feature improves our output, hence, it justifies the fact that codons can also vary translational score of RBS parts. Experimental model gives us the strength of promoter-RBS combined strength for 280 (2 promoters x 140 RBSs) combinations. This model provides a way to predict the variations in translational score of RBS parts unlike the existing tools. Using this model, users can get translational score for their expression systems.
  
 +
= Noise in Devices =
 +
== Introduction ==
 +
In order to understand the behavior of device components, we need to have at least two signals coming from the device to understand the variations from '''intrinsic''' and '''extrinsic''' sources. Similarly, we can have two or more protein producing parts in complex biological devices, using which, we can understand the behavior of biological devices. [http://www.pnas.org/content/99/20/12795.short Elowtiz et al] has done significant work in understanding noise in biological devices. We have observed that if the mean value of one signal is bigger than the other signal, Elowitz formula doesn't give us accurate noise due to under-representation of variations present in low protein expressing device.
  
 +
\begin{equation*}
 +
Noise_{int}=\frac{\langle {rfp-gfp} \rangle^2}{2*\langle gfp \rangle* \langle rfp \rangle};\\ 
 +
if \langle rfp \rangle > \langle gfp \rangle; Noise_{int} = \frac{{\langle rfp \rangle}^2}{2*\langle gfp \rangle*\langle rfp \rangle}; \\
 +
Noise_{int} = \frac{\langle {rfp} \rangle}{2*\langle gfp \rangle};
 +
\end{equation*}
  
 +
<gallery class=center widths=400px heights=230px>
 +
File:Noise_var1.png|Signals from biological device
 +
</gallery>
  
 +
== Solution ==
 +
To solve this problem, we have modified the current formula as follows:
 +
\begin{equation*}
 +
gr_{fold}=\dfrac{\langle rfp \rangle}{\langle gfp \rangle} \\
 +
Noise_{int}=\frac{\langle{rfp-gfp*gr_{fold}}\rangle^2}{2*\langle gfp \rangle*\langle rfp \rangle*gr_{fold}};\\ 
 +
\end{equation*}
  
 +
<gallery class=center widths=400px heights=230px center>
 +
File:Noise_var2.png|Corrected Signals from biological device
 +
</gallery>
  
+
== References ==
<div id="footer" style="display: block; background: #050505; margin: 0px; padding: 30px; text-align: justify; border: none; box-shadow: 0px -2px 4px rgba(5,5,5,0.5);  z-index: 20;">
+
[http://science.sciencemag.org/content/342/6157/475 1.] Goodman, Daniel B., George M. Church, and Sriram Kosuri. "Causes and effects of N-terminal codon bias in bacterial genes." Science 342.6157 (2013): 475-479.
<h1 style="font-family: L; font-weight: 200; font-size: 30px;  letter-spacing: 3px; color:white;">
+
{{IIT-Madras-Bottom/CSS}}
Contact Us
+
</h1>
+
                <br>
+
<ul>
+
<li style="display: inline-block; margin: 0px 0px 0px 20px; background: none; border: none; ">
+
<br>
+
<img src="https://static.igem.org/mediawiki/2016/c/c1/Iitm_place.png" width=50 height=50>
+
</li>
+
<li style="display: inline-block; vertical-align: top; font-size: 20px; color: white; font-weight: 200; padding-left: 0px; max-width: 500px; text-align: justify; background: none; border: none; line-height: 150%;">
+
Department of Biotechnology<br>
+
Bhupat & Jyoti Mehta School of Biosciences<br>
+
Indian Institute of Technology Madras<br>
+
Chennai - 600036
+
</li>
+
<li style="display: inline-block; margin: 0px 0px 0px 20px; background: none; border: none; ">
+
<br>
+
<img src="https://static.igem.org/mediawiki/2016/2/23/Iitm_mail.png" width=50 height=50>
+
</li>
+
<li style="display: inline-block; vertical-align: top; font-size: 20px; color: white; font-weight: 200; padding-left: 0px; max-width: 500px; text-align: justify; background: none; border: none;  line-height: 150%;">
+
<br>
+
igemiitm16@gmail.com
+
</li>
+
</ul>
+
</div>
+
</body>
+
</html>
+

Latest revision as of 23:40, 19 October 2016


Modularity of RBS parts

Introduction

The non-modular nature of Ribosomal Binding Sites in bacteria is well known to the synthetic biology community. Most of the biological parts have been assigned a strength for their functionality. For example, Promoters have transcriptional score (number of RNA molecules per DNA molecule), RBSs have translational score (number of protein molecules per RNA molecule). Ideally, if we were to use a promoter and a RBS to produce a protein, we should get (transcriptional score times translational score) number of protein molecules. In bacterial cells, transcription and translation processes are coupled and can occur simultaneously. This is why, these process are not independent and, hence, not modular. Apart from this, RNA molecules consisting of RBS and first few codons may form secondary structures, which in turn reduce the translation efficiency. Also, rare codons have also been shown to influence translation efficiency.

Secondary structure's and codons' influence on translational score of RBS are overlapping as they are a result of A, U, G, C combinations. Therefore, it is important to decouple both effects to unravel underlying patterns.

We have successfully validated an empirical model to predict variations in protein expression levels, which could help future iGEM teams. Here, we represent a thorough description of our work.

Methodology

The dataset from "Causes and effects of N-terminal codon bias in bacterial genes" paper was taken. Protein expression Data was available for following constructs : 2 promoters x 3 RBSs x 1781 (137x13) sfGFP variants in first 11 codons at N-terminal (3 RBS parts were B0034, B0032, B0030) and 2 promoter x 137 natural RBSs x 13 sfGFP variants in first 11 codons at N-terminal (2 Promoters were J23100 & J23108)

Hypothesis and Algorithm

At the beginning, we hypothesized following things based on the information available in literature:
1. Expression is inversely proportional to the stability of secondary structure of mRNA near RBS part.
2. Rare codons present in first 11 codons of proteins have the ability to increase or decrease the translational score of RBS parts.
3. Each RBS part has a native strength irrespective of the promoter and protein coding part it can be used with.

We designed an algorithm to compute the translational score of a given construct(which expresses a protein) in the following way: \begin{equation*} TS = \dfrac{NS*C_{pref}}{1+dG} + \alpha \end{equation*}

\begin{equation*} C_{pref}= C_{1}*C_{2}*C_{3}*...*C_{11}*C_{sfGFP} \end{equation*}

Objective Function: minimize \(\sum \mid TS_{model}-TS_{experiment} \mid\)

Outlier Removal: top scores in \(\mid TS_{model}-TS_{experiment} \mid\)

where \(C_{i}\) is the codon preference of codon present at \(i^{th}\) position, \(C_{sfGFP}\) a constant for sfGFP protein codons,
TS : Translational Score of RBS part,
NS : Native Strength of RBS part, which is equal to TS at \(\delta\)G=0, \(C_{pref}\)=1;
\(\delta\)G : Stability of RNA strand consisting of RBS and first 11 codon of protein,
\(\alpha\) is a constant.

Quantification of Non-modularity

As previously mentioned, RBS's have been found to be non-modular with respect to promoter and protein coding parts. A quantification of modularity would enable us to screen better RBS parts to make high order complex genetic circuits in high-throughput manner.

\begin{equation*} NM_p = \frac{\sigma_{NS}}{\langle {NS} \rangle} \\ NM_c = \frac{\sigma_{TS}}{\langle {TS} \rangle} \end{equation*} where, NMp is non-modularity of RBS w.r.t. promoters;
NMc is non-modularity of RBS w.r.t. protein coding part;
NS is the array of native strength of RBS part with given promoters,
TS is the array of translational score of RBS part with given protein coding parts.

Optimization

Above model was optimized to compute the unknown variables, Native Strength of RBSs and codon preference matrix, using the data from above mentioned paper. In MATLAB, fmincon function was used to minimize the sum of (\({TS_{model}-TS_{experiment})}^2\) for all 14137 constructs. After several iterations of optimization, we achieved following results. Optimization was done in MATLAB on a supercomputer facility at IIT Madras. Further, the system was optimized by removing 5%, 10% outliers, which were computed as the top scores in (\({TS_{model}-TS_{experiment})}^2\).


Results

We found that some codons favor the translation process, while some of this do not. Following is a list of codons, which favor the translation process:


Codon AAA AAT AGA AGC AGT ATA GAT GGC GGG GGT GTA TCA TCC TCT TGC
Amino Acid K N R S S I D G G G V S S S C
Preference Value 1.17 1.23 1.15 1.13 1.21 1.14 1.13 1.19 1.19 1.21 1.14 1.14 1.13 1.19 1.19

Following is a list of codons, which reduces the translation efficiency:

Codon CAC CGC CTC GTC TTC
Amino Acid H R L V F
Preference Value 0.9768 0.9862 0.979 0.997 0.993

A complete list of codons and their preference values are here


Conclusion

We could achieve a heuristic solution with a correlation of 0.87 with 90% of the data points for our experimental model, while null model gives us a correlation of 0.83 with 90% data points. Null model doesn't incorporate the feature of codon preference (Cpref). As we see that codon preference feature improves our output, hence, it justifies the fact that codons can also vary translational score of RBS parts. Experimental model gives us the strength of promoter-RBS combined strength for 280 (2 promoters x 140 RBSs) combinations. This model provides a way to predict the variations in translational score of RBS parts unlike the existing tools. Using this model, users can get translational score for their expression systems.

Noise in Devices

Introduction

In order to understand the behavior of device components, we need to have at least two signals coming from the device to understand the variations from intrinsic and extrinsic sources. Similarly, we can have two or more protein producing parts in complex biological devices, using which, we can understand the behavior of biological devices. [http://www.pnas.org/content/99/20/12795.short Elowtiz et al] has done significant work in understanding noise in biological devices. We have observed that if the mean value of one signal is bigger than the other signal, Elowitz formula doesn't give us accurate noise due to under-representation of variations present in low protein expressing device.

\begin{equation*} Noise_{int}=\frac{\langle {rfp-gfp} \rangle^2}{2*\langle gfp \rangle* \langle rfp \rangle};\\ if \langle rfp \rangle > \langle gfp \rangle; Noise_{int} = \frac{{\langle rfp \rangle}^2}{2*\langle gfp \rangle*\langle rfp \rangle}; \\ Noise_{int} = \frac{\langle {rfp} \rangle}{2*\langle gfp \rangle}; \end{equation*}

Solution

To solve this problem, we have modified the current formula as follows: \begin{equation*} gr_{fold}=\dfrac{\langle rfp \rangle}{\langle gfp \rangle} \\ Noise_{int}=\frac{\langle{rfp-gfp*gr_{fold}}\rangle^2}{2*\langle gfp \rangle*\langle rfp \rangle*gr_{fold}};\\ \end{equation*}

References

[http://science.sciencemag.org/content/342/6157/475 1.] Goodman, Daniel B., George M. Church, and Sriram Kosuri. "Causes and effects of N-terminal codon bias in bacterial genes." Science 342.6157 (2013): 475-479.