Difference between revisions of "Team:Valencia UPV/Loop"

Line 111: Line 111:
 
                         "img-responsive" src=
 
                         "img-responsive" src=
 
                         "https://static.igem.org/mediawiki/2016/7/7a/T--Valencia_UPV--overview_box.PNG"
 
                         "https://static.igem.org/mediawiki/2016/7/7a/T--Valencia_UPV--overview_box.PNG"
                         style="width:600px"></div>
+
                         style="width:500px"></div>
 
                         <p><br>
 
                         <p><br>
 
                         <br></p>
 
                         <br></p>
Line 124: Line 124:
 
                         <br>
 
                         <br>
 
                         <br>
 
                         <br>
                         <a href="https://www.codecogs.com/eqnedit.php?latex=k_r&space;=&space;\dfrac{6\cdot&space;D\cdot\lambda_{Cas9}\cdot[Cas9:gRNA]}{V}&space;=&space;0,0172[Cas9:gRNA]" target="_blank"><img src="https://latex.codecogs.com/svg.latex?k_r&space;=&space;\dfrac{6\cdot&space;D\cdot\lambda_{Cas9}\cdot[Cas9:gRNA]}{V}&space;=&space;0,0172[Cas9:gRNA]" title="k_r = \dfrac{6\cdot D\cdot\lambda_{Cas9}\cdot[Cas9:gRNA]}{V} = 0,0172[Cas9:gRNA]" /></a><br>
+
<div style="text-align:center;">
                        <br>
+
                         <a href="https://www.codecogs.com/eqnedit.php?latex=k_r&space;=&space;\dfrac{6\cdot&space;D\cdot\lambda_{Cas9}\cdot[Cas9:gRNA]}{V}&space;=&space;0,0172[Cas9:gRNA]" target="_blank"><img src="https://latex.codecogs.com/svg.latex?k_r&space;=&space;\dfrac{6\cdot&space;D\cdot\lambda_{Cas9}\cdot[Cas9:gRNA]}{V}&space;=&space;0,0172[Cas9:gRNA]" title="k_r = \dfrac{6\cdot D\cdot\lambda_{Cas9}\cdot[Cas9:gRNA]}{V} = 0,0172[Cas9:gRNA]" /></a></div><br>
 
                         <br>
 
                         <br>
 +
                        <br><p>
 
                         Where parameters with values indicated in Table 5,
 
                         Where parameters with values indicated in Table 5,
 
                         are:<br>
 
                         are:<br>
Line 148: Line 149:
 
                         "img-responsive" src=
 
                         "img-responsive" src=
 
                         "https://static.igem.org/mediawiki/2016/2/2e/T--Valencia_UPV--complex_in_the_nucleus.png"
 
                         "https://static.igem.org/mediawiki/2016/2/2e/T--Valencia_UPV--complex_in_the_nucleus.png"
                         style="width:400px"></div>
+
                         style="width:200px"></div>
 
                         <p><br>
 
                         <p><br>
 
                         Thus, assuming a spherical approach of the Cas9 spatial
 
                         Thus, assuming a spherical approach of the Cas9 spatial
Line 178: Line 179:
 
                         number for the Cas9 construction. Furthermore, the gain
 
                         number for the Cas9 construction. Furthermore, the gain
 
                         in the number of gene copies encoding Cas9, is the same
 
                         in the number of gene copies encoding Cas9, is the same
                         produced in the random contact ratio.<br>
+
                         produced in the random contact ratio.<br><br>
 
                         As it can be inferred from this analysis, achieving
 
                         As it can be inferred from this analysis, achieving
 
                         enough Cas9:gRNA concentration is critical to stablish
 
                         enough Cas9:gRNA concentration is critical to stablish
Line 209: Line 210:
 
                         "img-responsive" src=
 
                         "img-responsive" src=
 
                         "https://static.igem.org/mediawiki/2016/1/11/T--Valencia_UPV--R-loop.png"
 
                         "https://static.igem.org/mediawiki/2016/1/11/T--Valencia_UPV--R-loop.png"
                         style="width:600px"></div>
+
                         style="width:300px"></div>
 
                         <p><br>
 
                         <p><br>
 
                         Estimating the number of targets and off-targets, we
 
                         Estimating the number of targets and off-targets, we
Line 222: Line 223:
 
                         <br>
 
                         <br>
 
                         <br>
 
                         <br>
                         <a href="https://www.codecogs.com/eqnedit.php?latex=P_{complex,target}&space;=&space;\dfrac{\dfrac{N_{target,j}}{N}exp(\dfrac{-\Delta&space;G_{complex,target}}{k_B\cdot&space;T})}{1&plus;\Sigma_m^M\dfrac{N_{target,j}}{N}exp(\dfrac{-\Delta&space;G_{complex,target}}{k_B\cdot&space;T})}" target="_blank"><img src="https://latex.codecogs.com/svg.latex?P_{complex,target}&space;=&space;\dfrac{\dfrac{N_{target,j}}{N}exp(\dfrac{-\Delta&space;G_{complex,target}}{k_B\cdot&space;T})}{1&plus;\Sigma_m^M\dfrac{N_{target,j}}{N}exp(\dfrac{-\Delta&space;G_{complex,target}}{k_B\cdot&space;T})}" title="P_{complex,target} = \dfrac{\dfrac{N_{target,j}}{N}exp(\dfrac{-\Delta G_{complex,target}}{k_B\cdot T})}{1+\Sigma_m^M\dfrac{N_{target,j}}{N}exp(\dfrac{-\Delta G_{complex,target}}{k_B\cdot T})}" /></a><br>
+
<div style="text-align:center;">
                        <br>
+
                         <a href="https://www.codecogs.com/eqnedit.php?latex=P_{complex,target}&space;=&space;\dfrac{\dfrac{N_{target,j}}{N}exp(\dfrac{-\Delta&space;G_{complex,target}}{k_B\cdot&space;T})}{1&plus;\Sigma_m^M\dfrac{N_{target,j}}{N}exp(\dfrac{-\Delta&space;G_{complex,target}}{k_B\cdot&space;T})}" target="_blank"><img src="https://latex.codecogs.com/svg.latex?P_{complex,target}&space;=&space;\dfrac{\dfrac{N_{target,j}}{N}exp(\dfrac{-\Delta&space;G_{complex,target}}{k_B\cdot&space;T})}{1&plus;\Sigma_m^M\dfrac{N_{target,j}}{N}exp(\dfrac{-\Delta&space;G_{complex,target}}{k_B\cdot&space;T})}" title="P_{complex,target} = \dfrac{\dfrac{N_{target,j}}{N}exp(\dfrac{-\Delta G_{complex,target}}{k_B\cdot T})}{1+\Sigma_m^M\dfrac{N_{target,j}}{N}exp(\dfrac{-\Delta G_{complex,target}}{k_B\cdot T})}" /></a></div><br>
 
                         <br>
 
                         <br>
 +
                        <br><p>
 
                         This expression has information about the thermodynamic
 
                         This expression has information about the thermodynamic
 
                         balance after the R-loop formation. In order to obtain
 
                         balance after the R-loop formation. In order to obtain
Line 240: Line 242:
 
                     <div class="blog-post-item" id=
 
                     <div class="blog-post-item" id=
 
                     "Off-targetsearchalgorithm._id">
 
                     "Off-targetsearchalgorithm._id">
                         <h3>Off-target search algorithm.</h3>
+
                         <h4>Off-target search algorithm.</h4>
 
                         <p>Off-targets are DNA regions where the R-loop could
 
                         <p>Off-targets are DNA regions where the R-loop could
 
                         take place because of the high similarity between that
 
                         take place because of the high similarity between that
Line 276: Line 278:
 
                         (2), the process described above is a sequence of
 
                         (2), the process described above is a sequence of
 
                         reactions which in global, must accomplish the
 
                         reactions which in global, must accomplish the
                         thermodynamic law for this kind of processes:<br>
+
                         thermodynamic law for this kind of processes:</p>
 
                         <br>
 
                         <br>
 
                         <br>
 
                         <br>
                         <a href="https://www.codecogs.com/eqnedit.php?latex=\Delta&space;G_{complex,target}\leq&space;0" target="_blank"><img src="https://latex.codecogs.com/svg.latex?\Delta&space;G_{complex,target}\leq&space;0" title="\Delta G_{complex,target}\leq 0" /></a><br>
+
<div style="text-align:center;">
                         <br>
+
                         <a href="https://www.codecogs.com/eqnedit.php?latex=\Delta&space;G_{complex,target}\leq&space;0" target="_blank"><img src="https://latex.codecogs.com/svg.latex?\Delta&space;G_{complex,target}\leq&space;0" title="\Delta G_{complex,target}\leq 0" /></a></div><br>
 +
                         <p><br>
 
                         <br>
 
                         <br>
 
                         Where the free energy is decomposed in the main stages
 
                         Where the free energy is decomposed in the main stages
 
                         previously described (2,15):<br>
 
                         previously described (2,15):<br>
 
                         <br>
 
                         <br>
                         <br>
+
                         <br></p>
                         <a href="https://www.codecogs.com/eqnedit.php?latex=\Delta\Delta&space;G_{exchange,gRNA:target}&space;=&space;\Sigma_kd_k[\Delta&space;G^{RNA:DNA}_{k,k&plus;1}&space;-&space;\Delta&space;G^{DNA:DNA}_{k,k&plus;1}]" target="_blank"><img src="https://latex.codecogs.com/svg.latex?\Delta\Delta&space;G_{exchange,gRNA:target}&space;=&space;\Sigma_kd_k[\Delta&space;G^{RNA:DNA}_{k,k&plus;1}&space;-&space;\Delta&space;G^{DNA:DNA}_{k,k&plus;1}]" title="\Delta\Delta G_{exchange,gRNA:target} = \Sigma_kd_k[\Delta G^{RNA:DNA}_{k,k+1} - \Delta G^{DNA:DNA}_{k,k+1}]" /></a><br>
+
<div style="text-align:center;">
                         <br>
+
                         <a href="https://www.codecogs.com/eqnedit.php?latex=\Delta\Delta&space;G_{exchange,gRNA:target}&space;=&space;\Sigma_kd_k[\Delta&space;G^{RNA:DNA}_{k,k&plus;1}&space;-&space;\Delta&space;G^{DNA:DNA}_{k,k&plus;1}]" target="_blank"><img src="https://latex.codecogs.com/svg.latex?\Delta\Delta&space;G_{exchange,gRNA:target}&space;=&space;\Sigma_kd_k[\Delta&space;G^{RNA:DNA}_{k,k&plus;1}&space;-&space;\Delta&space;G^{DNA:DNA}_{k,k&plus;1}]" title="\Delta\Delta G_{exchange,gRNA:target} = \Sigma_kd_k[\Delta G^{RNA:DNA}_{k,k+1} - \Delta G^{DNA:DNA}_{k,k+1}]" /></a></div><br>
 +
                         <p><br>
 
                         <br>
 
                         <br>
 
                         Each one of the terms from the expression above, is
 
                         Each one of the terms from the expression above, is
Line 298: Line 302:
 
                         Values obtained for each of them were
 
                         Values obtained for each of them were
 
                         ΔG<sub>complex,Ga20Ox</sub>= -10,37 kcal/mol and
 
                         ΔG<sub>complex,Ga20Ox</sub>= -10,37 kcal/mol and
                         ΔG<sub>complex,TFL</sub> =-11,78 kcal/mol.<br>
+
                         ΔG<sub>complex,TFL</sub> =-11,78 kcal/mol.<br><br>
 
                         These values were used to obtain the probability that
 
                         These values were used to obtain the probability that
 
                         the Cas9:gRNA complex cleaves the target in the Testing
 
                         the Cas9:gRNA complex cleaves the target in the Testing
Line 305: Line 309:
 
                         off-targets, as they could obtain
 
                         off-targets, as they could obtain
 
                         ΔG<sub>complex,target</sub> &lt;0, letting the R-loop
 
                         ΔG<sub>complex,target</sub> &lt;0, letting the R-loop
                         be formed at those regions as well.<br>
+
                         be formed at those regions as well.<br><br>
 
                         After calculating the ΔG<sub>complex,offtarget</sub>
 
                         After calculating the ΔG<sub>complex,offtarget</sub>
 
                         for off-targets suggested by the algorithm, only the
 
                         for off-targets suggested by the algorithm, only the
Line 316: Line 320:
 
                     </div>
 
                     </div>
 
                     <div class="blog-post-item" id="PAMbindingenergy.._id">
 
                     <div class="blog-post-item" id="PAMbindingenergy.._id">
                         <h5>PAM binding energy..</h5>
+
                         <h5>PAM binding energy</h5>
 
                         <p>Once the complex has been formed, it must find the
 
                         <p>Once the complex has been formed, it must find the
 
                         PAM sequence of the target included in the Testing
 
                         PAM sequence of the target included in the Testing
Line 324: Line 328:
 
                         one type of PAM region. In our case, as we are working
 
                         one type of PAM region. In our case, as we are working
 
                         with the human type Cas9, the PAM sequence will be
 
                         with the human type Cas9, the PAM sequence will be
                         -NGGN-.<br>
+
                         -NGGN-.<br><br>
 
                         Bearing in mind that breaking one hydrogen bond
 
                         Bearing in mind that breaking one hydrogen bond
 
                         provides at least an energy supply of 1.2 kcal/mol, the
 
                         provides at least an energy supply of 1.2 kcal/mol, the
Line 357: Line 361:
 
                         “promiscuous” than others. We opted for including this
 
                         “promiscuous” than others. We opted for including this
 
                         potential off-target PAMs in our off-target search
 
                         potential off-target PAMs in our off-target search
                         algorithm.<br>
+
                         algorithm.<br><br>
 
                         This PAM-energy assignment is implemented in the Matlab
 
                         This PAM-energy assignment is implemented in the Matlab
 
                         function energy_PAM.mat. The input is the target
 
                         function energy_PAM.mat. The input is the target
Line 403: Line 407:
 
                         associated to different base pairs, we built a table
 
                         associated to different base pairs, we built a table
 
                         with the energy increment for all possible matching
 
                         with the energy increment for all possible matching
                         duplexes. This table provides all information to
+
                         duplexes. This table provides all necessary information to
 
                         calculate the ΔΔG<sub>exchange,gRNA:target</sub>
 
                         calculate the ΔΔG<sub>exchange,gRNA:target</sub>
 
                         (8,9,10,11), as there should not be mismatches between
 
                         (8,9,10,11), as there should not be mismatches between
Line 423: Line 427:
 
                         estimate the energetic cost of forming the R-loop, had
 
                         estimate the energetic cost of forming the R-loop, had
 
                         to consider the distance of base pairs to the PAM
 
                         to consider the distance of base pairs to the PAM
                         region.<br>
+
                         region.<br><br>
 
                         This kind of forward move which considers all the
 
                         This kind of forward move which considers all the
 
                         mentioned criteria, is known as nearest neighbor model.
 
                         mentioned criteria, is known as nearest neighbor model.
Line 432: Line 436:
 
                         energy from the k<sup>th</sup> hydrogen bond will be
 
                         energy from the k<sup>th</sup> hydrogen bond will be
 
                         used in the reaction of the nearest base pair, k +
 
                         used in the reaction of the nearest base pair, k +
                         1.<br>
+
                         1.<br><br>
 
                         Thus, the hybridization of several nucleotides can be
 
                         Thus, the hybridization of several nucleotides can be
 
                         represented as a sequence of binding reactions, leading
 
                         represented as a sequence of binding reactions, leading
Line 441: Line 445:
 
                         DNA-DNA hydrolysis and the pair RNA-DNA hybridized.<br>
 
                         DNA-DNA hydrolysis and the pair RNA-DNA hybridized.<br>
 
                         <br>
 
                         <br>
                         <br>
+
                         <br></p>
                         <a href="https://www.codecogs.com/eqnedit.php?latex=\Delta\Delta&space;G_{exchange,gRNA:target}&space;=&space;\Sigma_kd_k[\Delta&space;G^{RNA:DNA}_{k,k&plus;1}&space;-&space;\Delta&space;G^{DNA:DNA}_{k,k&plus;1}]" target="_blank"><img src="https://latex.codecogs.com/svg.latex?\Delta\Delta&space;G_{exchange,gRNA:target}&space;=&space;\Sigma_kd_k[\Delta&space;G^{RNA:DNA}_{k,k&plus;1}&space;-&space;\Delta&space;G^{DNA:DNA}_{k,k&plus;1}]" title="\Delta\Delta G_{exchange,gRNA:target} = \Sigma_kd_k[\Delta G^{RNA:DNA}_{k,k+1} - \Delta G^{DNA:DNA}_{k,k+1}]" /></a><br>
+
<div style="text-align:center;">
                         <br>
+
                         <a href="https://www.codecogs.com/eqnedit.php?latex=\Delta\Delta&space;G_{exchange,gRNA:target}&space;=&space;\Sigma_kd_k[\Delta&space;G^{RNA:DNA}_{k,k&plus;1}&space;-&space;\Delta&space;G^{DNA:DNA}_{k,k&plus;1}]" target="_blank"><img src="https://latex.codecogs.com/svg.latex?\Delta\Delta&space;G_{exchange,gRNA:target}&space;=&space;\Sigma_kd_k[\Delta&space;G^{RNA:DNA}_{k,k&plus;1}&space;-&space;\Delta&space;G^{DNA:DNA}_{k,k&plus;1}]" title="\Delta\Delta G_{exchange,gRNA:target} = \Sigma_kd_k[\Delta G^{RNA:DNA}_{k,k+1} - \Delta G^{DNA:DNA}_{k,k+1}]" /></a></div><br>
 +
                         <p><br>
 
                         <br>
 
                         <br>
 
                         However, the gRNA may have thermodynamically stable
 
                         However, the gRNA may have thermodynamically stable
Line 453: Line 458:
 
                         if mismatches are placed in the extreme opposite to the
 
                         if mismatches are placed in the extreme opposite to the
 
                         PAM, they may will not compromise the off-target
 
                         PAM, they may will not compromise the off-target
                         knockout.<br>
+
                         knockout.<br><br>
 
                         In a similar way as we did with matching duplexes, our
 
                         In a similar way as we did with matching duplexes, our
 
                         first try was to find more information about energy
 
                         first try was to find more information about energy
Line 664: Line 669:
 
                         agrees with criteria found in bibliography (2), letting
 
                         agrees with criteria found in bibliography (2), letting
 
                         us assume that the penalty system worked well as
 
                         us assume that the penalty system worked well as
                         representation of mismatch effects.<br>
+
                         representation of mismatch effects.<br><br>
 
                         The variability observed in each position is due to the
 
                         The variability observed in each position is due to the
 
                         differences between three possible nucleotides.
 
                         differences between three possible nucleotides.
Line 690: Line 695:
 
                         (σ<sub>NS</sub>) will affect to the difference in the
 
                         (σ<sub>NS</sub>) will affect to the difference in the
 
                         free energy needed to untwist the on-target DNA
 
                         free energy needed to untwist the on-target DNA
                         region:<br><br><br>
+
                         region:<br><br><br></p>
                         <a href="https://www.codecogs.com/eqnedit.php?latex=\Delta\Delta&space;G_{supercoiling,target}&space;=&space;-10nk_BT(\sigma_F^2&space;-&space;\sigma_I^2)" target="_blank"><img src="https://latex.codecogs.com/svg.latex?\Delta\Delta&space;G_{supercoiling,target}&space;=&space;-10nk_BT(\sigma_F^2&space;-&space;\sigma_I^2)" title="\Delta\Delta G_{supercoiling,target} = -10nk_BT(\sigma_F^2 - \sigma_I^2)" /></a><br><br><br>
+
<div style="text-align:center;">
                         As we could not determine the chromatin state and its
+
                         <a href="https://www.codecogs.com/eqnedit.php?latex=\Delta\Delta&space;G_{supercoiling,target}&space;=&space;-10nk_BT(\sigma_F^2&space;-&space;\sigma_I^2)" target="_blank"><img src="https://latex.codecogs.com/svg.latex?\Delta\Delta&space;G_{supercoiling,target}&space;=&space;-10nk_BT(\sigma_F^2&space;-&space;\sigma_I^2)" title="\Delta\Delta G_{supercoiling,target} = -10nk_BT(\sigma_F^2 - \sigma_I^2)" /></a></div><br><br><br>
 +
                         <p>As we could not determine the chromatin state and its
 
                         evolution during the time that CRISPR/Cas9 was working
 
                         evolution during the time that CRISPR/Cas9 was working
 
                         on the plant, we could not study this parameter.<br>
 
                         on the plant, we could not study this parameter.<br>

Revision as of 02:23, 7 November 2016

Overview


After the formation of the Cas9:gRNA complex, it must find the target and knock it out. As soon as the complex is formed, it will wander around the nucleus describing a random pathway. During this erratic trajectory, collisions with several regions of DNA will take place. If the union to those regions is thermodynamically balanced and feasible, i.e. regions are complementary enough to the gRNA sequence, the R-loop will take place. This structure results on the hybridization of the Cas9:gRNA complex to a DNA sequence. This implies not only joining the target, but also binding undesired though similar regions, named off-targets.


In this step of the modeling, we used Boltzmann probability distribution and Thermodynamics in order to estimate the probability that the Cas9:gRNA complex finds the target. We also developed an off-target search algorithm based on the transcriptional activity and target-similarity provided by local alignment algorithms.



Complex diffusion


The process of searching the target among all the genome is named scanning. Thus, we can express the contact rate between Cas9:gRNA complex and any other DNA region:





Where parameters with values indicated in Table 5, are:
D is the compound diffusivity.
[Cas9:gRNA] is the concentration of the complex.
V is the compartment volume, i.e. the plant nuclear volume.
λCas9 is the characteristic length between the place of production and binding.
Several assumptions were made to consider random three-dimensional diffusion of the complex. Those assumptions were:
The complex can be considered as a macromolecule with three-dimensional random diffusion around the nucleus.
The net molar flow is presumably equal to zero, as the compartment composition is well-mixed.
As DNA is dispersed among the nucleus, the complex will be almost in permanent contact with it.


Thus, assuming a spherical approach of the Cas9 spatial shape, λ= (V_Cas9)&frac13, because in the edge of Cas9 there will be DNA ready to hybridize gRNA if possible (2).
Varying the time of measurement and the number of Cas9 and gRNA copies introduced, we can expect different results for this ratio:

Figure 9. Comparison between values obtained for the diffusion ratio of the Cas9:gRNA complex al time t = 4320 minutes (3 days). All curves are relative to the results obtained with 1 copy for Cas9.


Simulations represented in the graphic above let us check that the gain in the rate of contact between the Cas9:gRNA complex, increases approximately in the same way when so does the concentration of the complex. A minimum of 10-15 gene copy number for the gRNA construction is necessary to achieve the plateau of the kr ratio, independently of the gene copy number for the Cas9 construction. Furthermore, the gain in the number of gene copies encoding Cas9, is the same produced in the random contact ratio.

As it can be inferred from this analysis, achieving enough Cas9:gRNA concentration is critical to stablish contact between the complex and the target. One possibility increase repeatability of the test, minimizing the randomness of the complex diffusion, is to infiltrate the Testing System construction with the gRNA sequence. The expected result was that if the gRNA is transcribed near the target, the aleatory of the three-dimensional diffusion would be minimized. Joining both pieces near to each other, it will be “easier” for the complex to find the target. This suggestion was implemented in wet-lab experiments, showing an increase of the light signal.

Probability of R-loop formation


In order to knock out our genetic target, it must be hybridized by the gRNA forming the R-loop. This structure provides Cas9 with the necessary stability to cut the DNA strand (2, 15). In order to get the structure, it is necessary that potential targets are complementary enough to the gRNA. Providing gRNA-DNA complementarity means accomplishing the thermodynamic requirements to let the knockout happen.


Estimating the number of targets and off-targets, we can obtain a distribution of the cleavage probability in function of energy needed to cleave each DNA location. Thus, M different energetic states are considered as places where the R-loop could take place, being our Testing System one of that states. This scenario can fit to a Boltzmann distribution (2,5), being the binding probability of the Cas9:gRNA complex with the m-DNA region:





This expression has information about the thermodynamic balance after the R-loop formation. In order to obtain it, we had to obtain previously the free energy increment for each DNA candidate (-ΔGcomplex,target), and the expected number of those regions (Ntarget).
Off-target regions were estimated using the off-target search algorithm, getting 1 off-target for Ga20Ox and 5 for TFL. The next section has the explanation of the free energy increment, which ensures the thermodynamic stability of the R-loop.

Off-target search algorithm.

Off-targets are DNA regions where the R-loop could take place because of the high similarity between that region and the target. This means that off targets steal Cas9 and gRNA supposed to knock out on-targets. Most reliable off-target predictions are obtained by experimental results, but in our model we must be able to find off-targets quickly for all possible targets (1,2) so we could not wait for experimental results.
Alternatively, we have developed an off-target search based in transcriptional activities and local alignments between target and off-target candidates. Our proposed strategy was the following one:


The first step was to create a gene library with sequences of the most transcribed genes in Nicotiana benthamiana. There is a clear relation between transcriptional activity and relaxed state of chromatin (13), letting us assume that those genes highly transcribed will be more accessible to the gRNA.
This algorithm is implemented in the Matlab function Nboffsearch.m. The search of potential off-targets for each of our two targets, gave a result of 1 off-target for Ga20Ox and 5 off-targets for the TFL.

Free energy increment ΔGcomplex,target.

As there is no energy supply catalyzing the R-loop (2), the process described above is a sequence of reactions which in global, must accomplish the thermodynamic law for this kind of processes:






Where the free energy is decomposed in the main stages previously described (2,15):





Each one of the terms from the expression above, is explained in following sections of the Modeling. We determined these parameters for two of the targets contained in our Database: ORYZA SATIVA JAPONICA GROUP GIBBERELLIN 20 OXIDASE 2 (LOC4325003) and CITRUS SINENSIS TERMINAL FLOWER (TFL). Those induce higher grain yield in rice flowering in orange, respectively. Values obtained for each of them were ΔGcomplex,Ga20Ox= -10,37 kcal/mol and ΔGcomplex,TFL =-11,78 kcal/mol.

These values were used to obtain the probability that the Cas9:gRNA complex cleaves the target in the Testing System, performing the desired knockout. To calculate this probability, we had to account for possible off-targets, as they could obtain ΔGcomplex,target <0, letting the R-loop be formed at those regions as well.

After calculating the ΔGcomplex,offtarget for off-targets suggested by the algorithm, only the off-target for the Ga20Ox could be considered as a final off-target, as TFL off-targets got free energy increments higher than zero. Therefore, we calculated the Pcomplex,Ga20Ox, obtaining a value of 0,9964.

PAM binding energy

Once the complex has been formed, it must find the PAM sequence of the target included in the Testing System. The PAM region has between 3 and 5 nucleotides which are recognized by Cas9, enabling the union of the complex to the DNA. Each Cas9 specie binds better to one type of PAM region. In our case, as we are working with the human type Cas9, the PAM sequence will be -NGGN-.

Bearing in mind that breaking one hydrogen bond provides at least an energy supply of 1.2 kcal/mol, the binding of the PAM sequence must give a free energy ΔGPAM at least of 9 kcal/mol:


Relying on the nucleotides arrangement, ΔGPAM resulting from the PAM recognition will vary. However, it is possible that Cas9 interacts with a PAM region even though there are mismatches between them (2). In order to know if this affinity for regions with mismatches was significant, we studied the ΔGPAM obtained for all possible PAM combinations.


White spaces in picture above represent PAM alternatives which do not bind significantly to a -NGG- PAM. The lower binding energy is clearly for PAMs with the structure -NGG-, achieving less than -9kcal/mol. However, there are other alternatives with affinity enough to let the Cas9 bind to them. The potential of these regions as possible off-targets relies also on the Cas9 specie being used, as there are some more “promiscuous” than others. We opted for including this potential off-target PAMs in our off-target search algorithm.

This PAM-energy assignment is implemented in the Matlab function energy_PAM.mat. The input is the target sequence. Comparing the PAM extreme nucleotides of the input sequence with a table containing Cas9-PAM binding energies, it matches the information of the string input with the corresponding energyΔGPAM . In the particular case of targets implemented in our testing system, the value of this parameter was:
Table 2: Results of ΔGPAM for targets implemented in Testing System

Target PAM ΔGPAM
Rice - Oryza sativa, Semi-dwarf; higher grain yield. CGG -9,600
Orange - Citrus sinensis, Induced flowering.
TGG -9,700

Cas9:gRNA:DNA hybridization

Secondly, the release of the energy from PAM binding will be used to hybridize the gRNA and nucleotides from the DNA sequence. In order to know the energy associated to different base pairs, we built a table with the energy increment for all possible matching duplexes. This table provides all necessary information to calculate the ΔΔGexchange,gRNA:target (8,9,10,11), as there should not be mismatches between both of them. In the graphic below it can be appreciated that there are two sources of variability affecting the energetic balance of duplex hybridization. On one hand, there is a clear dependence of the nucleotide, and on the other hand, the type of nucleic acid (RNA or DNA) also affects the free energy increment.


Thus, we had to choose an approach which considered the energy differences between different nucleotides and different nucleic acids. Moreover, the model used to estimate the energetic cost of forming the R-loop, had to consider the distance of base pairs to the PAM region.

This kind of forward move which considers all the mentioned criteria, is known as nearest neighbor model. The meaning of this model is that RNA:DNA union will rely on the context. The energy used to bind a pair of duplexes, uses the energy released by the previous duplex union. In other words, it is assumed that the energy from the kth hydrogen bond will be used in the reaction of the nearest base pair, k + 1.

Thus, the hybridization of several nucleotides can be represented as a sequence of binding reactions, leading to a global difference between the hydrolysis of DNA:DNA bounds, and the union of gRNA:DNA strands. The term ΔΔGexchange gRNA:target reflects the difference between the free energy used for the pair DNA-DNA hydrolysis and the pair RNA-DNA hybridized.





However, the gRNA may have thermodynamically stable unions with other regions which are not the target. Those DNA regions, named off-targets, may have only few mismatches that slightly affect its affinity towards the gRNA. The half part of the gRNA close to the PAM, is the most determining to form the R-loop. Therefore, if mismatches are placed in the extreme opposite to the PAM, they may will not compromise the off-target knockout.

In a similar way as we did with matching duplexes, our first try was to find more information about energy accounted when mismatches are produced. Nevertheless, there is poor consensus among bibliography, not all possible duplexes have been studied and we neither could determine these energetic values empirically.
In order to solve this, we created a penalty vector which adds a penalty to the match binding energy for each mismatch. The term dk refers to a weight that decreases as k is increased, with k = 1,2,3…length gRNA (typically 20-23). We have estimated values of those weights using criteria from our Scoring System, as it had been validated comparing to other target searchers available online. Those coefficients are multiplied by the single mismatch average penalty of 0.78 kcal/mol, extracted from bibliography (2). The implementation of the penalties is in the Matlab function weights_exchange.m, and the vector of penalties is represented below



Using this strategy, we studied how would could vary the ΔΔGexchange gRNA:target estimated for a target with different number and positions of mismatches. We implemented the obtention of ΔΔGexchange gRNA:target in the Matlab function energy_exchange.m. Results obtained for our particular targets were:
Table 3: Results of ΔΔGexchange gRNA:target for targets implemented in Testing System.

Target ΔΔGexchange gRNA:target
Rice - Oryza sativa, Semi-dwarf; higher grain yield. -0.7710
Orange - Citrus sinensis, Induced flowering.
-2.0785

In order to know more about how do mismatches affect the thermodynamic balance of the gRNA and DNA target hybridization, we calculated values of ΔΔGexchange gRNA:target for rice and orange, varying the position of a single mismatch. Conditions of the simulations are in Table 4.
Table 4: Results of ΔΔGexchange gRNA:target for targets implemented in Testing System.

Original nucleotide New nucleotide position ΔΔGexchange gRNA:target
Rice - Oryza sativa, Semi-dwarf; higher grain yield.
G A 23 -0,563
G T 23 -0,603
G C 23 -0,703
G A 10 0,369
G T 10 0,169
G C 10 -0,381
C A 7 4,989
C G 7 2,389
C T 7 4,589
Orange - Citrus sinensis, Induced flowering
G A 23 -1,8805
G T 23 -1,8905
G C 23 -1,9705
T A 12 -0,8885
T G 12 -0,3885
T C 12 -1,3385
C A 4 2,6815
C G 4 -0,3185
C T 4 1,6815


The results illustrated in the graphic above, show that as expected, the R-loop formation is less likely as it is reduced the distance between a mismatch and the PAM. In general, it seems that our thermodynamics approach emulates well the mechanism of a R-loop formation. With both targets, the average result of changing one nucleotide, is decreased with higher PAM-distance. Mismatches placed downstream the 20 th nucleotide, typically result in positive free energy increments, avoiding the RNA:DNA hybridization. This agrees with criteria found in bibliography (2), letting us assume that the penalty system worked well as representation of mismatch effects.

The variability observed in each position is due to the differences between three possible nucleotides. However, there are some atypical results as well which may be caused by unknown sources of variability. For instance, the ΔΔGexchange gRNA:target is overlapped for positions 23 and 10 in rice, while energy difference for the position 23 was supposed to be minor. This could be due to the necessity of training and improving our function, using parameters in the penalty function which are based on empirical evidence.

DNA supercoiling

Finally, the chromatin state is critical to let the gRNA hybridize the DNA, and some energy can be extra-needed if the DNA is "relaxed", i.e. positively supercoiled. Consequently, regions highly similar may will not be able to join the gRNA because the chromatin could be compressed. Having off-targets means that the binding will be taking place in some unspecified regions. The difference between the initial density of a target σI and a non-specific region (σNS) will affect to the difference in the free energy needed to untwist the on-target DNA region:





As we could not determine the chromatin state and its evolution during the time that CRISPR/Cas9 was working on the plant, we could not study this parameter.
Nevertheless, information about the chromatin supercoiling has been indirectly introduced in our model. High transcription activities can be synonym of relaxed chromatin (12, 13). Thus, we can assume that supercoiling will not be affecting to our Testing System, since it has the 35S promotor, which is one a constitutive promotor with high activity (4). Moreover, the difference in DNA supercoiling between the target and off-targets, can be considered nearly zero because potential off-targets will be those genes of Nicotiana benthamiana which are highly transcribed.

Parameters


Table 5: Parameters used calculation of the kcleavage TS parameter.

Parameter Value Source
D 2700 µm2/min Reference (6)
λ 0,015 µm Reference (7)
V 14140 µm3 Waterloo iGEM team 2015
[Cas9:gRNA](t) t = 3 days Model estimated.
ΔΔG single mismatch penalty 0.078 kcal/mol Reference (2)
kr 0.0172⋅[Cas9:gRNA] Model estimated
kc 0,48 min-1 Reference (2)
kunbind 300 min-1 Reference (2)
kB 0.0019872041 kcal/(mol⋅K) Reference (5)
N(OFF-TARGETS) 1 for Ga20Ox Model estimated
T 297 K Experiment conditions
P(complex,Ga20Ox) 0,9964 Model estimated

Main remarks

In order to reduce the random influence of three-dimensional diffusion of the complex, we suggested introducing the gRNA and the Testing System constructions one next to the other. Single mismatches positioned downstream the 11th-10 th nucleotide of the gRNA, lead to a positive free energy increment, i.e. they make unable the R-loop between that gRNA and the DNA target.

Sponsors