Difference between revisions of "Team:IIT-Madras/Model"

Line 180: Line 180:
 
<h2>Modularity of Ribosomal Binding Sites</h2>
 
<h2>Modularity of Ribosomal Binding Sites</h2>
  
<h3>Key Achievements</h3> :
+
<h3>Key Achievements</h3>  
 +
<p>
 +
<br>
 
1. Validated an algorithm to predict variations in the protein expression.
 
1. Validated an algorithm to predict variations in the protein expression.
 +
<br>
 
2. Obtained a codon preference matrix for first 11 codons in protein coding parts.
 
2. Obtained a codon preference matrix for first 11 codons in protein coding parts.
 +
<br>
 
3. Relative global native strength of B0032, B0030, B0034 RBS parts.
 
3. Relative global native strength of B0032, B0030, B0034 RBS parts.
 +
<br>
 
4. Quantification of global modularity of above mentioned RBS parts w.r.t protein coding parts and promoter parts.
 
4. Quantification of global modularity of above mentioned RBS parts w.r.t protein coding parts and promoter parts.
 +
</p>
  
Methodology:
+
<h3>Methodology</h3>
 +
<p>
 
The dataset from "Causes and effects of N-terminal codon bias in bacterial genes" paper was taken. Protein expression Data was available for following constructs :  
 
The dataset from "Causes and effects of N-terminal codon bias in bacterial genes" paper was taken. Protein expression Data was available for following constructs :  
 
2 promoters x 3 RBSs x 1781 (137x13) sfGFP variants in first 11 codons at N-terminal (3 RBS parts were B0034, B0032, B0030)
 
2 promoters x 3 RBSs x 1781 (137x13) sfGFP variants in first 11 codons at N-terminal (3 RBS parts were B0034, B0032, B0030)
 
And 2 promoter x 137 natural RBSs x 13 sfGFP variants in first 11 codons at N-terminal (2 Promoters were J23100 & J23108)
 
And 2 promoter x 137 natural RBSs x 13 sfGFP variants in first 11 codons at N-terminal (2 Promoters were J23100 & J23108)
 +
</p>
  
Hypothesis:
+
<h3>Hypothesis</h3>
 +
<p>
 
At the beginning, we hypothesized following things based on the information available in literature:
 
At the beginning, we hypothesized following things based on the information available in literature:
 +
<br>
 
Expression is inversely proportional to the stability of secondary structure of mRNA near RBS part.
 
Expression is inversely proportional to the stability of secondary structure of mRNA near RBS part.
 
Rare codons present in first 11 codons of proteins have the ability to increase or decrease the translational score of RBS parts.
 
Rare codons present in first 11 codons of proteins have the ability to increase or decrease the translational score of RBS parts.
 
Each RBS part has a native strength irrespective of the promoter and protein coding part it can be used with.
 
Each RBS part has a native strength irrespective of the promoter and protein coding part it can be used with.
 
+
<br><br>
 
We designed an algorithm to compute the translational score of a given protein expressing construct in following way:
 
We designed an algorithm to compute the translational score of a given protein expressing construct in following way:
 +
<br>
 
1. Input data included
 
1. Input data included
 
Native strength of Promoter-RBS as PiRi, where RBS (Ri) is associated with Promoter (Pi) in the construct
 
Native strength of Promoter-RBS as PiRi, where RBS (Ri) is associated with Promoter (Pi) in the construct
 
DeltaG value for the RBS part and first 11 codons of protein coding part.
 
DeltaG value for the RBS part and first 11 codons of protein coding part.
 +
<br>
 
Codon matrix for first 11 codon of protein coding parts.  
 
Codon matrix for first 11 codon of protein coding parts.  
 +
<br>
 
2. Output Data was the translational score for every construct.
 
2. Output Data was the translational score for every construct.
 +
<br><br>
 
3. Algorithm :
 
3. Algorithm :
 
+
<br>
 
a. Compute Cpref value for each construct
 
a. Compute Cpref value for each construct
 
Each codon out of 64 codons in the first 11 codons of protein coding parts has been assigned a value, which refer to the increase/decrease in the translational score. (1+ is positive, 1 is neutral, 1- is negative)
 
Each codon out of 64 codons in the first 11 codons of protein coding parts has been assigned a value, which refer to the increase/decrease in the translational score. (1+ is positive, 1 is neutral, 1- is negative)
 
Cpref= C1 *C2 *...*C10 *C11 *CsfGFP, where Ci is the preference of the codon present at i th position and CsfGFP , combined score for the rest of codons in protein variants i.e. sfGFP
 
Cpref= C1 *C2 *...*C10 *C11 *CsfGFP, where Ci is the preference of the codon present at i th position and CsfGFP , combined score for the rest of codons in protein variants i.e. sfGFP
 +
<br>
 
b. Translational score =[(PiRi)*Cpref(i)/(1 + deltaG(i))] + α1, where α1 is a constant for all constructs.
 
b. Translational score =[(PiRi)*Cpref(i)/(1 + deltaG(i))] + α1, where α1 is a constant for all constructs.
 +
<br>
 
c. Alternatively, to show the significance of rare codons we changed the Cpref value to 1 and optimized the system in same fashion.
 
c. Alternatively, to show the significance of rare codons we changed the Cpref value to 1 and optimized the system in same fashion.
Optimization:
+
</p>
 +
 
 +
 
 +
<h3>Optimization</h3>
 +
<p>
 
Above model was optimized to compute the unknown variables, PiRi, codon matrix values, using the data from above mentioned paper. In MATLAB, fmincon function was used to minimize the sum of (model-experimental)^2 for all 14137 constrcuts.
 
Above model was optimized to compute the unknown variables, PiRi, codon matrix values, using the data from above mentioned paper. In MATLAB, fmincon function was used to minimize the sum of (model-experimental)^2 for all 14137 constrcuts.
 
Further, the system was optimized to by removing 5%, 10% outliers, which were computed as the top scores in abs(model-experimental).
 
Further, the system was optimized to by removing 5%, 10% outliers, which were computed as the top scores in abs(model-experimental).
 +
</p>
 +
  
Results:
+
<h3>Results</h3>
 +
<p>
 
After several iterations of optimization, we achieved following results
 
After several iterations of optimization, we achieved following results
 
Optimization was done in MATLAB on a supercomputer facility at IIT Madras.
 
Optimization was done in MATLAB on a supercomputer facility at IIT Madras.
 
+
<br><br>
 
Relative native strength:
 
Relative native strength:
 +
<br>
 
B0034: 0.6163, B0032: 1.0 , B0030: 0.5473 averaged over with J23100 and J23108 promoters.
 
B0034: 0.6163, B0032: 1.0 , B0030: 0.5473 averaged over with J23100 and J23108 promoters.
 
+
<br><br>
 
Global modularity of RBSs w.r.t promoters : (defined as std(score)/mean(score))
 
Global modularity of RBSs w.r.t promoters : (defined as std(score)/mean(score))
 +
<br>
 
B0034: 0.3372,  B0032:  0.2974, B0030: 1.1370  
 
B0034: 0.3372,  B0032:  0.2974, B0030: 1.1370  
 
+
<br><br>
 
Global modularity of RBSs w.r.t protein coding parts : (defined as mean(std(score)/mean(score)) for different promoters)
 
Global modularity of RBSs w.r.t protein coding parts : (defined as mean(std(score)/mean(score)) for different promoters)
 +
<br>
 
B0034: 0.7815, B0032: 0.7127, B0030: 0.9958
 
B0034: 0.7815, B0032: 0.7127, B0030: 0.9958
 +
</p>
  
Conclusion:
 
We could achieve a heuristic solution with a correlation of 0.87 with 90% of the data points. Model gives us the strength of promoter-RBS combined strength for 280 (2 promoters x 140 RBSs) combiations. It also gave us the codon preference matrix for 64 codons, (shown below).
 
  
 +
<h3>Conclusion</h3>
 +
<p>
 +
We could achieve a heuristic solution with a correlation of 0.87 with 90% of the data points. Model gives us the strength of promoter-RBS combined strength for 280 (2 promoters x 140 RBSs) combiations. It also gave us the codon preference matrix for 64 codons, (shown below).<br>
 +
<img src="https://static.igem.org/mediawiki/2016/f/fc/Iitmadras_model2_log2express.jpeg" width="800" height="470">
 +
<br>
 
We can observe a significant decrease (on average xx%) in the strength of RBSs (xx out of 140), when they are used with high strength promoters.
 
We can observe a significant decrease (on average xx%) in the strength of RBSs (xx out of 140), when they are used with high strength promoters.
 
</p>
 
</p>
  
<img src="https://static.igem.org/mediawiki/2016/f/fc/Iitmadras_model2_log2express.jpeg" width="800" height="470">
 
  
<h2>Section 2</h2>
 
<p>Content 2</p>
 
  
  

Revision as of 12:29, 2 October 2016

MODEL

Modularity of Ribosomal Binding Sites

Key Achievements


1. Validated an algorithm to predict variations in the protein expression.
2. Obtained a codon preference matrix for first 11 codons in protein coding parts.
3. Relative global native strength of B0032, B0030, B0034 RBS parts.
4. Quantification of global modularity of above mentioned RBS parts w.r.t protein coding parts and promoter parts.

Methodology

The dataset from "Causes and effects of N-terminal codon bias in bacterial genes" paper was taken. Protein expression Data was available for following constructs : 2 promoters x 3 RBSs x 1781 (137x13) sfGFP variants in first 11 codons at N-terminal (3 RBS parts were B0034, B0032, B0030) And 2 promoter x 137 natural RBSs x 13 sfGFP variants in first 11 codons at N-terminal (2 Promoters were J23100 & J23108)

Hypothesis

At the beginning, we hypothesized following things based on the information available in literature:
Expression is inversely proportional to the stability of secondary structure of mRNA near RBS part. Rare codons present in first 11 codons of proteins have the ability to increase or decrease the translational score of RBS parts. Each RBS part has a native strength irrespective of the promoter and protein coding part it can be used with.

We designed an algorithm to compute the translational score of a given protein expressing construct in following way:
1. Input data included Native strength of Promoter-RBS as PiRi, where RBS (Ri) is associated with Promoter (Pi) in the construct DeltaG value for the RBS part and first 11 codons of protein coding part.
Codon matrix for first 11 codon of protein coding parts.
2. Output Data was the translational score for every construct.

3. Algorithm :
a. Compute Cpref value for each construct Each codon out of 64 codons in the first 11 codons of protein coding parts has been assigned a value, which refer to the increase/decrease in the translational score. (1+ is positive, 1 is neutral, 1- is negative) Cpref= C1 *C2 *...*C10 *C11 *CsfGFP, where Ci is the preference of the codon present at i th position and CsfGFP , combined score for the rest of codons in protein variants i.e. sfGFP
b. Translational score =[(PiRi)*Cpref(i)/(1 + deltaG(i))] + α1, where α1 is a constant for all constructs.
c. Alternatively, to show the significance of rare codons we changed the Cpref value to 1 and optimized the system in same fashion.

Optimization

Above model was optimized to compute the unknown variables, PiRi, codon matrix values, using the data from above mentioned paper. In MATLAB, fmincon function was used to minimize the sum of (model-experimental)^2 for all 14137 constrcuts. Further, the system was optimized to by removing 5%, 10% outliers, which were computed as the top scores in abs(model-experimental).

Results

After several iterations of optimization, we achieved following results Optimization was done in MATLAB on a supercomputer facility at IIT Madras.

Relative native strength:
B0034: 0.6163, B0032: 1.0 , B0030: 0.5473 averaged over with J23100 and J23108 promoters.

Global modularity of RBSs w.r.t promoters : (defined as std(score)/mean(score))
B0034: 0.3372, B0032: 0.2974, B0030: 1.1370

Global modularity of RBSs w.r.t protein coding parts : (defined as mean(std(score)/mean(score)) for different promoters)
B0034: 0.7815, B0032: 0.7127, B0030: 0.9958

Conclusion

We could achieve a heuristic solution with a correlation of 0.87 with 90% of the data points. Model gives us the strength of promoter-RBS combined strength for 280 (2 promoters x 140 RBSs) combiations. It also gave us the codon preference matrix for 64 codons, (shown below).

We can observe a significant decrease (on average xx%) in the strength of RBSs (xx out of 140), when they are used with high strength promoters.