Line 81: | Line 81: | ||
<div class="blog-post-item" id="Overviewsec_id"> | <div class="blog-post-item" id="Overviewsec_id"> | ||
<h3>Overview</h3> | <h3>Overview</h3> | ||
− | <div style="text-align:center;"><img class="img-responsive" src="https://static.igem.org/mediawiki/2016/5/5f/T--Valencia_UPV--overview_second_section.PNG"style="width:400px" align=" | + | <div style="text-align:center;"><img class="img-responsive" src="https://static.igem.org/mediawiki/2016/5/5f/T--Valencia_UPV--overview_second_section.PNG"style="width:400px" align="left"></div><br><p> |
After the formation of the Cas9:gRNA complex, it must <b>find</b> the <b>target</b> and knock it out. As soon as the complex is formed, it will wander around the nucleus describing a <b>random</b> pathway. <br><br>During this erratic trajectory, collisions with several regions of DNA will take place. If the union to those regions is <b>thermodynamically balanced</b> and <b>feasible</b> the R-loop will take place.<br><br>This structure results on the <b>hybridization</b> of the <b>Cas9:gRNA</b> complex to a <b>DNA</b> sequence. This implies not only joining the target, but also to undesired regions similar to the target, named <b>off-targets.</b></p><br> | After the formation of the Cas9:gRNA complex, it must <b>find</b> the <b>target</b> and knock it out. As soon as the complex is formed, it will wander around the nucleus describing a <b>random</b> pathway. <br><br>During this erratic trajectory, collisions with several regions of DNA will take place. If the union to those regions is <b>thermodynamically balanced</b> and <b>feasible</b> the R-loop will take place.<br><br>This structure results on the <b>hybridization</b> of the <b>Cas9:gRNA</b> complex to a <b>DNA</b> sequence. This implies not only joining the target, but also to undesired regions similar to the target, named <b>off-targets.</b></p><br> | ||
<p>In this step of the modeling, we used <b>Boltzmann</b> probability distribution and <b>Thermodynamics</b> in order to estimate the probability that the complex binds to the target, using a <b>search algorithm</b> based on the <b>transcriptional activity</b> and <b>target-similarity</b> provided by <b>local alignment</b> algorithms.</p> | <p>In this step of the modeling, we used <b>Boltzmann</b> probability distribution and <b>Thermodynamics</b> in order to estimate the probability that the complex binds to the target, using a <b>search algorithm</b> based on the <b>transcriptional activity</b> and <b>target-similarity</b> provided by <b>local alignment</b> algorithms.</p> | ||
Line 135: | Line 135: | ||
style="width:600px"> | style="width:600px"> | ||
<p class="imgFooterP" style= | <p class="imgFooterP" style= | ||
− | "text-align: center;font-style: italic;">Figure | + | "text-align: center;font-style: italic;">Figure 10. |
Comparison between values obtained for the | Comparison between values obtained for the | ||
diffusion ratio of the Cas9:gRNA complex al time t | diffusion ratio of the Cas9:gRNA complex al time t | ||
Line 146: | Line 146: | ||
Cas9:gRNA complex, increases approximately in the same | Cas9:gRNA complex, increases approximately in the same | ||
way when so does the concentration of the complex. A | way when so does the concentration of the complex. A | ||
− | minimum of 10-15 gene copy number for the gRNA | + | <b>minimum</b> of <b>10-15 gene copy number</b> for the gRNA |
construction is necessary to achieve the plateau of the | construction is necessary to achieve the plateau of the | ||
k<sub>r</sub> ratio, independently of the gene copy | k<sub>r</sub> ratio, independently of the gene copy | ||
Line 155: | Line 155: | ||
enough Cas9:gRNA concentration is critical to stablish | enough Cas9:gRNA concentration is critical to stablish | ||
contact between the complex and the target. One | contact between the complex and the target. One | ||
− | possibility increase repeatability of the test, | + | possibility to <b>increase repeatability</b> of the test, |
− | minimizing the randomness of the complex diffusion, is | + | <b>minimizing the randomness</b> of the complex diffusion, is |
− | to infiltrate the Testing System construction with the | + | to <b>infiltrate the Testing System construction with the |
− | gRNA sequence. <br><br>The expected result was that if the gRNA | + | gRNA sequence</b>. <br><br>The expected result was that if the gRNA |
is transcribed near the target, the aleatory of the | is transcribed near the target, the aleatory of the | ||
three-dimensional diffusion would be minimized. Joining | three-dimensional diffusion would be minimized. Joining | ||
both pieces near to each other, it will be “easier” for | both pieces near to each other, it will be “easier” for | ||
the complex to find the target. This suggestion was | the complex to find the target. This suggestion was | ||
− | implemented in wet-lab experiments, showing an increase | + | implemented in <b>wet-lab experiments</b>, showing an <b>increase |
− | of the light signal.<br> | + | of the light signal</b>.<br> |
<br></p> | <br></p> | ||
</div> | </div> | ||
Line 175: | Line 175: | ||
structure provides Cas9 with the necessary stability to | structure provides Cas9 with the necessary stability to | ||
cut the DNA strand (2, 15). In order to get the | cut the DNA strand (2, 15). In order to get the | ||
− | structure, it is necessary that potential targets are | + | structure, it is necessary that <b>potential targets are |
− | complementary enough to the gRNA. Providing gRNA-DNA | + | complementary enough to the gRNA</b>. Providing gRNA-DNA |
− | complementarity means accomplishing the thermodynamic | + | complementarity means accomplishing the <b>thermodynamic |
− | requirements to let the knockout happen.<br></p> | + | requirements</b> to let the knockout happen.<br></p> |
<div style="text-align:center;"><img class= | <div style="text-align:center;"><img class= | ||
"img-responsive" src= | "img-responsive" src= | ||
Line 190: | Line 190: | ||
considered as places where the R-loop could take place, | considered as places where the R-loop could take place, | ||
being our Testing System one of that states. This | being our Testing System one of that states. This | ||
− | scenario can fit to a Boltzmann distribution (2,5), | + | scenario can fit to a <b>Boltzmann distribution</b> (2,5), |
being the binding probability of the Cas9:gRNA complex | being the binding probability of the Cas9:gRNA complex | ||
with the m-DNA region:<br> | with the m-DNA region:<br> | ||
Line 200: | Line 200: | ||
This expression has information about the thermodynamic | This expression has information about the thermodynamic | ||
balance after the R-loop formation. In order to obtain | balance after the R-loop formation. In order to obtain | ||
− | it, we had to obtain previously the free energy | + | it, we had to obtain previously the <b>free energy |
− | increment for each DNA candidate | + | increment for each DNA candidate</b> |
(-ΔG<sub>complex,target</sub>), and the expected number | (-ΔG<sub>complex,target</sub>), and the expected number | ||
of those regions (N<sub>target</sub>).<br><br> | of those regions (N<sub>target</sub>).<br><br> | ||
Off-target regions were estimated using the off-target | Off-target regions were estimated using the off-target | ||
− | search algorithm, getting 1 off-target for Ga20Ox and 5 | + | search algorithm, getting <b>1 off-target for Ga20Ox and 5 |
− | for TFL. The next section has the explanation of the | + | for TFL</b>. The next section has the explanation of the |
free energy increment, which ensures the thermodynamic | free energy increment, which ensures the thermodynamic | ||
stability of the R-loop.<br> | stability of the R-loop.<br> | ||
Line 215: | Line 215: | ||
<h4>Off-target search algorithm.</h4> | <h4>Off-target search algorithm.</h4> | ||
<p>Off-targets are DNA regions where the R-loop could | <p>Off-targets are DNA regions where the R-loop could | ||
− | take place because of the high similarity between that | + | take place because of the <b>high similarity</b> between that |
region and the target. This means that off targets | region and the target. This means that off targets | ||
steal Cas9 and gRNA supposed to knock out on-targets. | steal Cas9 and gRNA supposed to knock out on-targets. | ||
Line 224: | Line 224: | ||
results.<br><br> | results.<br><br> | ||
Alternatively, we have developed an off-target search | Alternatively, we have developed an off-target search | ||
− | based in transcriptional activities and local | + | <b>based in transcriptional activities and local |
− | alignments between target and off-target candidates. | + | alignments</b> between target and off-target candidates. |
Our proposed strategy was the following one:<br></p> | Our proposed strategy was the following one:<br></p> | ||
<div style="text-align:center;"><img class= | <div style="text-align:center;"><img class= | ||
Line 232: | Line 232: | ||
style="width:600px"></div> | style="width:600px"></div> | ||
<p><br> | <p><br> | ||
− | The first step was to create a gene library with | + | The first step was to create a <b>gene library</b> with |
− | sequences of the most transcribed genes in <i>Nicotiana | + | sequences of the <b>most transcribed genes in <i>Nicotiana |
− | benthamiana</i>. There is a clear relation between | + | benthamiana</i></b>. There is a clear relation between |
transcriptional activity and relaxed state of chromatin | transcriptional activity and relaxed state of chromatin | ||
(13), letting us assume that those genes highly | (13), letting us assume that those genes highly | ||
Line 247: | Line 247: | ||
"FreeenergyincrementΔGSUBcomplex,targetSB1._id"> | "FreeenergyincrementΔGSUBcomplex,targetSB1._id"> | ||
<h4>Free energy increment ΔG<sub>complex,target</sub>.</h4> | <h4>Free energy increment ΔG<sub>complex,target</sub>.</h4> | ||
− | <p>As there is no energy supply catalyzing the R-loop | + | <p>As there is <b>no energy supply catalyzing the R-loop</b> |
(2), the process described above is a sequence of | (2), the process described above is a sequence of | ||
reactions which in global, must accomplish the | reactions which in global, must accomplish the | ||
Line 267: | Line 267: | ||
Each one of the terms from the expression above, is | Each one of the terms from the expression above, is | ||
explained in following sections of the Modeling. We | explained in following sections of the Modeling. We | ||
− | determined these parameters for two of the targets | + | determined these parameters for <b>two of the targets |
− | contained in our Database: ORYZA SATIVA JAPONICA GROUP | + | contained in our Database</b>: ORYZA SATIVA JAPONICA GROUP |
GIBBERELLIN 20 OXIDASE 2 (LOC4325003) and CITRUS | GIBBERELLIN 20 OXIDASE 2 (LOC4325003) and CITRUS | ||
SINENSIS TERMINAL FLOWER (TFL). Those induce higher | SINENSIS TERMINAL FLOWER (TFL). Those induce higher | ||
Line 278: | Line 278: | ||
the Cas9:gRNA complex cleaves the target in the Testing | the Cas9:gRNA complex cleaves the target in the Testing | ||
System, performing the desired knockout. To calculate | System, performing the desired knockout. To calculate | ||
− | this probability, we had to account for possible | + | this probability, we had to account for <b>possible |
− | off-targets, as they could obtain | + | off-targets</b>, as they could obtain |
ΔG<sub>complex,target</sub> <0, letting the R-loop | ΔG<sub>complex,target</sub> <0, letting the R-loop | ||
be formed at those regions as well.<br><br> | be formed at those regions as well.<br><br> | ||
Line 287: | Line 287: | ||
final off-target, as TFL off-targets got free energy | final off-target, as TFL off-targets got free energy | ||
increments higher than zero. Therefore, we calculated | increments higher than zero. Therefore, we calculated | ||
− | the P<sub>complex,Ga20Ox</sub>, obtaining a value of | + | the <b>P<sub>complex,Ga20Ox</sub></b>, obtaining a value of |
0,9964.<br> | 0,9964.<br> | ||
<br></p> | <br></p> | ||
Line 295: | Line 295: | ||
<p>Once the complex has been formed, it must find the | <p>Once the complex has been formed, it must find the | ||
PAM sequence of the target included in the Testing | PAM sequence of the target included in the Testing | ||
− | System. The PAM region has between 3 and 5 nucleotides | + | System. The PAM region has between <b>3 and 5 nucleotides</b> |
which are recognized by Cas9, enabling the union of the | which are recognized by Cas9, enabling the union of the | ||
complex to the DNA. Each Cas9 specie binds better to | complex to the DNA. Each Cas9 specie binds better to | ||
Line 306: | Line 306: | ||
style="width:300px" align="right"> | style="width:300px" align="right"> | ||
<p><br> | <p><br> | ||
− | Bearing in mind that breaking one hydrogen bond | + | Bearing in mind that breaking <b>one hydrogen bond</b> |
− | provides at least an energy supply of 1.2 kcal/mol, the | + | provides at least an energy supply of <b>1.2 kcal/mol</b>, the |
binding of the PAM sequence must give a free energy | binding of the PAM sequence must give a free energy | ||
ΔG<sub>PAM</sub> at least of 9 kcal/mol.<br><br> | ΔG<sub>PAM</sub> at least of 9 kcal/mol.<br><br> | ||
Relying on the nucleotides arrangement, | Relying on the nucleotides arrangement, | ||
ΔG<sub>PAM</sub> resulting from the PAM recognition | ΔG<sub>PAM</sub> resulting from the PAM recognition | ||
− | will vary. However, it is possible that Cas9 interacts | + | will vary. However, <b>it is possible that Cas9 interacts |
− | with a PAM region even though there are mismatches | + | with a PAM region even though there are mismatches</b> |
between them (2). In order to know if this affinity for | between them (2). In order to know if this affinity for | ||
regions with mismatches was significant, we studied the | regions with mismatches was significant, we studied the | ||
Line 321: | Line 321: | ||
"img-responsive" src= | "img-responsive" src= | ||
"https://static.igem.org/mediawiki/2016/3/31/T--Valencia_UPV--PAMenergies.png" | "https://static.igem.org/mediawiki/2016/3/31/T--Valencia_UPV--PAMenergies.png" | ||
− | style="width:600px"></div> | + | style="width:600px"><p class="imgFooterP" style= |
+ | "text-align: center;font-style: italic;">Figure 11. | ||
+ | Energies obtained by combining all possible RNA and DNA pairs with different PAM sequences. Units are kcal/mol.</p></div> | ||
<br> | <br> | ||
<p>White spaces in picture above represent PAM | <p>White spaces in picture above represent PAM | ||
alternatives which do not bind significantly to a -NGG- | alternatives which do not bind significantly to a -NGG- | ||
PAM. The lower binding energy is clearly for PAMs with | PAM. The lower binding energy is clearly for PAMs with | ||
− | the structure -NGG-, achieving less than -9kcal/mol. | + | the structure <b>-NGG-, achieving less than -9kcal/mol</b>. |
However, there are other alternatives with affinity | However, there are other alternatives with affinity | ||
enough to let the Cas9 bind to them. The potential of | enough to let the Cas9 bind to them. The potential of | ||
these regions as possible off-targets relies also on | these regions as possible off-targets relies also on | ||
the Cas9 specie being used, as there are some more | the Cas9 specie being used, as there are some more | ||
− | “promiscuous” than others. We opted for including this | + | “promiscuous” than others. We opted for <b>including this |
potential off-target PAMs in our off-target search | potential off-target PAMs in our off-target search | ||
− | algorithm.<br><br></p> | + | algorithm</b>.<br><br></p> |
<div class="table-responsive" style= | <div class="table-responsive" style= | ||
"width:55%;overflow:inherit" align="right"></div> | "width:55%;overflow:inherit" align="right"></div> | ||
Line 377: | Line 379: | ||
"Cas9:gRNA:DNAhybridization._id"> | "Cas9:gRNA:DNAhybridization._id"> | ||
<h5>Cas9:gRNA:DNA hybridization</h5> | <h5>Cas9:gRNA:DNA hybridization</h5> | ||
− | <p>Secondly, the release of the energy from PAM binding | + | <p>Secondly, the release of the <b>energy from PAM binding |
− | will be used to hybridize the gRNA and nucleotides from | + | will be used to hybridize </b>the <b>gRNA</b> and nucleotides from |
− | the DNA sequence. In order to know the energy | + | the <b>DNA</b> sequence. In order to know the energy |
associated to different base pairs, we built a table | associated to different base pairs, we built a table | ||
with the energy increment for all possible matching | with the energy increment for all possible matching | ||
duplexes. This table provides all necessary information to | duplexes. This table provides all necessary information to | ||
calculate the ΔΔG<sub>exchange,gRNA:target</sub> | calculate the ΔΔG<sub>exchange,gRNA:target</sub> | ||
− | (8,9,10,11), as there should not be mismatches between | + | (8,9,10,11), as <b>there should not be mismatches between |
− | both of them. In the graphic below it can be | + | both of them</b>. In the graphic below it can be |
appreciated that there are two sources of variability | appreciated that there are two sources of variability | ||
affecting the energetic balance of duplex | affecting the energetic balance of duplex | ||
Line 395: | Line 397: | ||
"img-responsive" src= | "img-responsive" src= | ||
"https://static.igem.org/mediawiki/2016/4/4a/T--Valencia_UPV--duplexesenergies.png" | "https://static.igem.org/mediawiki/2016/4/4a/T--Valencia_UPV--duplexesenergies.png" | ||
− | style="width:500px"></div> | + | style="width:500px"><p class="imgFooterP" style= |
+ | "text-align: center;font-style: italic;">Figure 12. | ||
+ | Comparison between the energy exchange resulting from hybridizations with all possible nucleotides of RNA or DNA</p></div> | ||
<p><br> | <p><br> | ||
Thus, we had to choose an approach which considered the | Thus, we had to choose an approach which considered the | ||
− | energy differences between different nucleotides and | + | energy <b>differences between different nucleotides</b> and |
− | different nucleic acids. Moreover, the model used to | + | <b>different nucleic acids</b>. Moreover, the model used to |
estimate the energetic cost of forming the R-loop, had | estimate the energetic cost of forming the R-loop, had | ||
− | to consider the distance of base pairs to the PAM | + | to consider the <b>distance of base pairs to the PAM</b> |
region.<br><br> | region.<br><br> | ||
− | This kind of forward move which considers all the | + | This kind of <b>forward move</b> which considers all the |
− | mentioned criteria, is known as nearest neighbor model. | + | mentioned criteria, is known as <b>nearest neighbor model</b>. |
The meaning of this model is that RNA:DNA union will | The meaning of this model is that RNA:DNA union will | ||
rely on the context. The energy used to bind a pair of | rely on the context. The energy used to bind a pair of | ||
Line 430: | Line 434: | ||
the gRNA. The half part of the gRNA close to the PAM, | the gRNA. The half part of the gRNA close to the PAM, | ||
is the most determining to form the R-loop. Therefore, | is the most determining to form the R-loop. Therefore, | ||
− | if mismatches are placed in the extreme opposite to the | + | <b>if mismatches are</b> placed in the extreme <b>opposite</b> to the |
− | PAM, they may will not compromise the off-target | + | PAM, they may <b>will not compromise</b> the off-target |
knockout.<br><br></div> | knockout.<br><br></div> | ||
<p> | <p> | ||
Line 440: | Line 444: | ||
possible duplexes have been studied and we neither | possible duplexes have been studied and we neither | ||
could determine these energetic values empirically.<br><br> | could determine these energetic values empirically.<br><br> | ||
− | In order to solve this, we created a penalty vector | + | In order to solve this, we created a <b>penalty vector</b> |
which adds a penalty to the match binding energy for | which adds a penalty to the match binding energy for | ||
each mismatch: </p> | each mismatch: </p> | ||
Line 453: | Line 457: | ||
System, as it had been validated comparing to other | System, as it had been validated comparing to other | ||
target searchers available online. <br><br>Those coefficients, collected in a vector which is represented on the right, | target searchers available online. <br><br>Those coefficients, collected in a vector which is represented on the right, | ||
− | are multiplied by the single mismatch average penalty | + | are multiplied by the <b>single mismatch average penalty</b> |
of 0.78 kcal/mol, extracted from bibliography (2). The | of 0.78 kcal/mol, extracted from bibliography (2). The | ||
implementation of the penalties is in the Matlab | implementation of the penalties is in the Matlab | ||
Line 492: | Line 496: | ||
the thermodynamic balance of the gRNA and DNA target | the thermodynamic balance of the gRNA and DNA target | ||
hybridization, we calculated values of ΔΔG<sub>exchange | hybridization, we calculated values of ΔΔG<sub>exchange | ||
− | gRNA:target</sub> for rice and orange, varying the | + | gRNA:target</sub> for rice and orange, <b>varying the |
− | position of a single mismatch. Conditions of the | + | position of a single mismatch</b>. Conditions of the |
simulations are in Table 4.<br> | simulations are in Table 4.<br> | ||
<br></p> | <br></p> | ||
Line 632: | Line 636: | ||
emulates well the mechanism of a R-loop formation. With | emulates well the mechanism of a R-loop formation. With | ||
both targets, the average result of changing one | both targets, the average result of changing one | ||
− | nucleotide, is decreased with higher PAM-distance. | + | nucleotide, is <b>decreased with higher PAM-distance</b>. |
− | Mismatches placed downstream the 20 <sup>th</sup> | + | Mismatches placed <b>downstream the 20 <sup>th</sup> |
− | nucleotide, typically result in positive free energy | + | nucleotide</b>, typically result in <b>positive free energy</b> |
increments, avoiding the RNA:DNA hybridization. This | increments, avoiding the RNA:DNA hybridization. This | ||
agrees with criteria found in bibliography (2), letting | agrees with criteria found in bibliography (2), letting | ||
Line 642: | Line 646: | ||
"img-responsive" src= | "img-responsive" src= | ||
"https://static.igem.org/mediawiki/2016/d/d5/T--Valencia_UPV--energycsmismatch.png" | "https://static.igem.org/mediawiki/2016/d/d5/T--Valencia_UPV--energycsmismatch.png" | ||
− | style="width:600px"></div> | + | style="width:600px"><p class="imgFooterP" style= |
+ | "text-align: center;font-style: italic;">Figure 13. | ||
+ | Estimation of total energy exchange produced between the gRNA and the DNA target, using examples of TFL and Ga20ox.</p></div> | ||
<br><br></p> | <br><br></p> | ||
<p> The variability observed in each position is due to the | <p> The variability observed in each position is due to the | ||
Line 652: | Line 658: | ||
energy difference for the position 23 was supposed to | energy difference for the position 23 was supposed to | ||
be minor. This could be due to the necessity of | be minor. This could be due to the necessity of | ||
− | training and improving our function, using parameters | + | <b>training and improving</b> our function, using parameters |
− | in the penalty function which are based on empirical | + | in the penalty function which are based on <b>empirical |
− | evidence.<br><br></p> | + | evidence</b>.<br><br></p> |
</div> | </div> | ||
<div class="blog-post-item" id="DNAsupercoiling._id"> | <div class="blog-post-item" id="DNAsupercoiling._id"> | ||
<h5>DNA supercoiling</h5> | <h5>DNA supercoiling</h5> | ||
<p>Finally, the chromatin state is critical to let the | <p>Finally, the chromatin state is critical to let the | ||
− | gRNA hybridize the DNA, and some energy can be | + | gRNA hybridize the DNA, and <b>some energy can be |
extra-needed if the DNA is "relaxed", i.e. positively | extra-needed if the DNA is "relaxed", i.e. positively | ||
− | supercoiled. Consequently, regions highly similar may | + | supercoiled</b>. Consequently, regions highly similar may |
will not be able to join the gRNA because the chromatin | will not be able to join the gRNA because the chromatin | ||
could be compressed. Having off-targets means that the | could be compressed. Having off-targets means that the | ||
Line 675: | Line 681: | ||
evolution during the time that CRISPR/Cas9 was working | evolution during the time that CRISPR/Cas9 was working | ||
on the plant, we could not study this parameter. Nevertheless, information about the chromatin | on the plant, we could not study this parameter. Nevertheless, information about the chromatin | ||
− | supercoiling has been indirectly introduced in our | + | supercoiling has been <b>indirectly introduced</b> in our |
model. <br><br> | model. <br><br> | ||
− | Since high transcription activities can be synonym of | + | Since <b>high transcription activities</b> can be synonym of |
− | + | <b> relaxed chromatin</b> (12, 13), we can assume that | |
− | supercoiling will not be affecting to our Testing | + | <b>supercoiling will not be affecting to our Testing |
− | System, as it has the 35S promotor (a | + | System</b>, as it has the 35S promotor (a |
constitutive promotor with high activity (4)). Moreover, | constitutive promotor with high activity (4)). Moreover, | ||
the difference in DNA supercoiling between the target | the difference in DNA supercoiling between the target |
Revision as of 11:03, 2 December 2016
R-Loop Formation
Overview
After the formation of the Cas9:gRNA complex, it must find the target and knock it out. As soon as the complex is formed, it will wander around the nucleus describing a random pathway.
During this erratic trajectory, collisions with several regions of DNA will take place. If the union to those regions is thermodynamically balanced and feasible the R-loop will take place.
This structure results on the hybridization of the Cas9:gRNA complex to a DNA sequence. This implies not only joining the target, but also to undesired regions similar to the target, named off-targets.
In this step of the modeling, we used Boltzmann probability distribution and Thermodynamics in order to estimate the probability that the complex binds to the target, using a search algorithm based on the transcriptional activity and target-similarity provided by local alignment algorithms.
Complex diffusion
Model and assumptions
The process of searching the target among all the genome is named scanning. Thus, we can express the contact rate between Cas9:gRNA complex and any other DNA region:
Where parameters with values indicated in Table 5, are:
- D is the compound diffusivity.
- [Cas9:gRNA] is the concentration of the complex.
- V is the compartment volume, i.e. the plant nuclear volume.
- λCas9 is the characteristic length between the place of production and binding.
The picture below represents a spherical approach of the Cas9 shape, being λ=VCas91/3. We can consider that next to the edge of Cas9 there will be DNA ready to hybridize gRNA if possible (2). Assumptions made to consider this situation are:
- The complex can be considered as a macromolecule with three-dimensional random diffusion around the nucleus.
- The net molar flow is presumably equal to zero, as the compartment composition is well-mixed.
- As DNA is dispersed among the nucleus, the complex will be almost in permanent contact with it.
Simulations
Varying the time of measurement and the number of Cas9
and gRNA copies introduced, we can expect different
results for this ratio:
Simulations represented in the graphic above let us
check that the gain in the rate of contact between the
Cas9:gRNA complex, increases approximately in the same
way when so does the concentration of the complex. A
minimum of 10-15 gene copy number for the gRNA
construction is necessary to achieve the plateau of the
kr ratio, independently of the gene copy
number for the Cas9 construction. Furthermore, the gain
in the number of gene copies encoding Cas9, is the same
produced in the random contact ratio.
As it can be inferred from this analysis, achieving
enough Cas9:gRNA concentration is critical to stablish
contact between the complex and the target. One
possibility to increase repeatability of the test,
minimizing the randomness of the complex diffusion, is
to infiltrate the Testing System construction with the
gRNA sequence.
The expected result was that if the gRNA
is transcribed near the target, the aleatory of the
three-dimensional diffusion would be minimized. Joining
both pieces near to each other, it will be “easier” for
the complex to find the target. This suggestion was
implemented in wet-lab experiments, showing an increase
of the light signal.
Probability of R-loop formation
In order to knock out our genetic target, it must be
hybridized by the gRNA forming the R-loop. This
structure provides Cas9 with the necessary stability to
cut the DNA strand (2, 15). In order to get the
structure, it is necessary that potential targets are
complementary enough to the gRNA. Providing gRNA-DNA
complementarity means accomplishing the thermodynamic
requirements to let the knockout happen.
Estimating the number of targets and off-targets, we
can obtain a distribution of the cleavage probability
in function of energy needed to cleave each DNA
location. Thus, M different energetic states are
considered as places where the R-loop could take place,
being our Testing System one of that states. This
scenario can fit to a Boltzmann distribution (2,5),
being the binding probability of the Cas9:gRNA complex
with the m-DNA region:
This expression has information about the thermodynamic
balance after the R-loop formation. In order to obtain
it, we had to obtain previously the free energy
increment for each DNA candidate
(-ΔGcomplex,target), and the expected number
of those regions (Ntarget).
Off-target regions were estimated using the off-target
search algorithm, getting 1 off-target for Ga20Ox and 5
for TFL. The next section has the explanation of the
free energy increment, which ensures the thermodynamic
stability of the R-loop.
Off-target search algorithm.
Off-targets are DNA regions where the R-loop could
take place because of the high similarity between that
region and the target. This means that off targets
steal Cas9 and gRNA supposed to knock out on-targets.
Most reliable off-target predictions are obtained by
experimental results, but in our model we must be able
to find off-targets quickly for all possible targets
(1,2) so we could not wait for experimental
results.
Alternatively, we have developed an off-target search
based in transcriptional activities and local
alignments between target and off-target candidates.
Our proposed strategy was the following one:
The first step was to create a gene library with
sequences of the most transcribed genes in Nicotiana
benthamiana. There is a clear relation between
transcriptional activity and relaxed state of chromatin
(13), letting us assume that those genes highly
transcribed will be more accessible to the gRNA.
This algorithm is implemented in the Matlab function
Nboffsearch.m. The search of potential off-targets for
each of our two targets, gave a result of 1 off-target
for Ga20Ox and 5 off-targets for the TFL.
Free energy increment ΔGcomplex,target.
As there is no energy supply catalyzing the R-loop (2), the process described above is a sequence of reactions which in global, must accomplish the thermodynamic law for this kind of processes:
Where the free energy is decomposed in the main stages
previously described (2,15):
Each one of the terms from the expression above, is
explained in following sections of the Modeling. We
determined these parameters for two of the targets
contained in our Database: ORYZA SATIVA JAPONICA GROUP
GIBBERELLIN 20 OXIDASE 2 (LOC4325003) and CITRUS
SINENSIS TERMINAL FLOWER (TFL). Those induce higher
grain yield in rice flowering in orange, respectively.
Values obtained for each of them were
ΔGcomplex,Ga20Ox= -10,37 kcal/mol and
ΔGcomplex,TFL =-11,78 kcal/mol.
These values were used to obtain the probability that
the Cas9:gRNA complex cleaves the target in the Testing
System, performing the desired knockout. To calculate
this probability, we had to account for possible
off-targets, as they could obtain
ΔGcomplex,target <0, letting the R-loop
be formed at those regions as well.
After calculating the ΔGcomplex,offtarget
for off-targets suggested by the algorithm, only the
off-target for the Ga20Ox could be considered as a
final off-target, as TFL off-targets got free energy
increments higher than zero. Therefore, we calculated
the Pcomplex,Ga20Ox, obtaining a value of
0,9964.
PAM binding energy
Once the complex has been formed, it must find the
PAM sequence of the target included in the Testing
System. The PAM region has between 3 and 5 nucleotides
which are recognized by Cas9, enabling the union of the
complex to the DNA. Each Cas9 specie binds better to
one type of PAM region. In our case, as we are working
with the human type Cas9, the PAM sequence will be
-NGGN-.
Bearing in mind that breaking one hydrogen bond
provides at least an energy supply of 1.2 kcal/mol, the
binding of the PAM sequence must give a free energy
ΔGPAM at least of 9 kcal/mol.
Relying on the nucleotides arrangement,
ΔGPAM resulting from the PAM recognition
will vary. However, it is possible that Cas9 interacts
with a PAM region even though there are mismatches
between them (2). In order to know if this affinity for
regions with mismatches was significant, we studied the
ΔGPAM obtained for all possible PAM
combinations.
White spaces in picture above represent PAM
alternatives which do not bind significantly to a -NGG-
PAM. The lower binding energy is clearly for PAMs with
the structure -NGG-, achieving less than -9kcal/mol.
However, there are other alternatives with affinity
enough to let the Cas9 bind to them. The potential of
these regions as possible off-targets relies also on
the Cas9 specie being used, as there are some more
“promiscuous” than others. We opted for including this
potential off-target PAMs in our off-target search
algorithm.
This PAM-energy assignment is implemented in the Matlab
function energy_PAM.mat. The input is the target
sequence. Comparing the PAM extreme nucleotides of the
input sequence with a table containing Cas9-PAM binding
energies, it matches the information of the string
input with the corresponding energyΔGPAM .
In the particular case of targets implemented in our
testing system, the value of this parameter is in Table 2.
Target | PAM | ΔGPAM |
---|---|---|
Rice - Oryza sativa, Semi-dwarf; higher grain yield. | CGG | -9,600 |
Orange - Citrus sinensis, Induced flowering. | TGG | -9,700 |
Cas9:gRNA:DNA hybridization
Secondly, the release of the energy from PAM binding
will be used to hybridize the gRNA and nucleotides from
the DNA sequence. In order to know the energy
associated to different base pairs, we built a table
with the energy increment for all possible matching
duplexes. This table provides all necessary information to
calculate the ΔΔGexchange,gRNA:target
(8,9,10,11), as there should not be mismatches between
both of them. In the graphic below it can be
appreciated that there are two sources of variability
affecting the energetic balance of duplex
hybridization. On one hand, there is a clear dependence
of the nucleotide, and on the other hand, the type of
nucleic acid (RNA or DNA) also affects the free energy
increment.
Thus, we had to choose an approach which considered the
energy differences between different nucleotides and
different nucleic acids. Moreover, the model used to
estimate the energetic cost of forming the R-loop, had
to consider the distance of base pairs to the PAM
region.
This kind of forward move which considers all the
mentioned criteria, is known as nearest neighbor model.
The meaning of this model is that RNA:DNA union will
rely on the context. The energy used to bind a pair of
duplexes, uses the energy released by the previous
duplex union. In other words, it is assumed that the
energy from the kth hydrogen bond will be
used in the reaction of the nearest base pair, k +
1.
Thus, the hybridization of several nucleotides can be
represented as a sequence of binding reactions, leading
to a global difference between the hydrolysis of
DNA:DNA bounds, and the union of gRNA:DNA strands. The
term ΔΔGexchange gRNA:target reflects the
difference between the free energy used for the pair
DNA-DNA hydrolysis and the pair RNA-DNA hybridized.
However, the gRNA may have thermodynamically stable
unions with other regions which are not the target.
Those DNA regions, named off-targets, may have only few
mismatches that slightly affect its affinity towards
the gRNA. The half part of the gRNA close to the PAM,
is the most determining to form the R-loop. Therefore,
if mismatches are placed in the extreme opposite to the
PAM, they may will not compromise the off-target
knockout.
In a similar way as we did with matching duplexes, our
first try was to find more information about energy
accounted when mismatches are produced. Nevertheless,
there is poor consensus among bibliography, not all
possible duplexes have been studied and we neither
could determine these energetic values empirically.
In order to solve this, we created a penalty vector
which adds a penalty to the match binding energy for
each mismatch:
The term dk in the equation above, refers to a
weight that decreases as k is increased, with k =
1,2,3…length gRNA (typically 20-23). We have estimated
values of those weights using criteria from our Scoring
System, as it had been validated comparing to other
target searchers available online.
Those coefficients, collected in a vector which is represented on the right,
are multiplied by the single mismatch average penalty
of 0.78 kcal/mol, extracted from bibliography (2). The
implementation of the penalties is in the Matlab
function weights_exchange.m.
Using this strategy, we studied how would could vary
the ΔΔGexchange gRNA:target estimated for a
target with different number and positions of
mismatches. We implemented the obtention of
ΔΔGexchange gRNA:target in the Matlab
function energy_exchange.m. Results obtained for our
particular targets were:
Target | ΔΔGexchange gRNA:target |
---|---|
Rice - Oryza sativa, Semi-dwarf; higher grain yield. | -0.7710 |
Orange - Citrus sinensis, Induced flowering. | -2.0785 |
In order to know more about how do mismatches affect
the thermodynamic balance of the gRNA and DNA target
hybridization, we calculated values of ΔΔGexchange
gRNA:target for rice and orange, varying the
position of a single mismatch. Conditions of the
simulations are in Table 4.
Original nucleotide | New nucleotide | position | ΔΔGexchange gRNA:target |
---|---|---|---|
Rice - Oryza sativa, Semi-dwarf; higher grain yield. | |||
G | A | 23 | -0,563 |
G | T | 23 | -0,603 |
G | C | 23 | -0,703 |
G | A | 10 | 0,369 |
G | T | 10 | 0,169 |
G | C | 10 | -0,381 |
C | A | 7 | 4,989 |
C | G | 7 | 2,389 |
C | T | 7 | 4,589 |
Orange - Citrus sinensis, Induced flowering | |||
G | A | 23 | -1,8805 |
G | T | 23 | -1,8905 |
G | C | 23 | -1,9705 |
T | A | 12 | -0,8885 |
T | G | 12 | -0,3885 |
T | C | 12 | -1,3385 |
C | A | 4 | 2,6815 |
C | G | 4 | -0,3185 |
C | T | 4 | 1,6815 |
Results illustrated in the graphic below, show that
as expected, the R-loop formation is less likely as it
is reduced the distance between a mismatch and the PAM.
In general, it seems that our thermodynamics approach
emulates well the mechanism of a R-loop formation. With
both targets, the average result of changing one
nucleotide, is decreased with higher PAM-distance.
Mismatches placed downstream the 20 th
nucleotide, typically result in positive free energy
increments, avoiding the RNA:DNA hybridization. This
agrees with criteria found in bibliography (2), letting
us assume that the penalty system worked well as
representation of mismatch effects.
The variability observed in each position is due to the
differences between three possible nucleotides.
However, there are some atypical results as well which
may be caused by unknown sources of variability. For
instance, the ΔΔGexchange gRNA:target is
overlapped for positions 23 and 10 in rice, while
energy difference for the position 23 was supposed to
be minor. This could be due to the necessity of
training and improving our function, using parameters
in the penalty function which are based on empirical
evidence.
DNA supercoiling
Finally, the chromatin state is critical to let the
gRNA hybridize the DNA, and some energy can be
extra-needed if the DNA is "relaxed", i.e. positively
supercoiled. Consequently, regions highly similar may
will not be able to join the gRNA because the chromatin
could be compressed. Having off-targets means that the
binding will be taking place in some unspecified
regions. The difference between the initial density of
a target σI and a non-specific region
(σNS) will affect to the difference in the
free energy needed to untwist the on-target DNA
region:
As we could not determine the chromatin state and its
evolution during the time that CRISPR/Cas9 was working
on the plant, we could not study this parameter. Nevertheless, information about the chromatin
supercoiling has been indirectly introduced in our
model.
Since high transcription activities can be synonym of
relaxed chromatin (12, 13), we can assume that
supercoiling will not be affecting to our Testing
System, as it has the 35S promotor (a
constitutive promotor with high activity (4)). Moreover,
the difference in DNA supercoiling between the target
and off-targets, can be considered nearly zero because
potential off-targets will be those genes of
Nicotiana benthamiana which are highly
transcribed.
Parameters
Parameter | Value | Source |
---|---|---|
D | 2700 µm2/min | Reference (6) |
λ | 0,015 µm | Reference (7) |
V | 14140 µm3 | Waterloo iGEM team 2015 |
[Cas9:gRNA](t) | t = 3 days | Model estimated. |
ΔΔG single mismatch penalty | 0.078 kcal/mol | Reference (2) |
kr | 0.0172⋅[Cas9:gRNA] | Model estimated |
kc | 0,48 min-1 | Reference (2) |
kunbind | 300 min-1 | Reference (2) |
kB | 0.0019872041 kcal/(mol⋅K) | Reference (5) |
N(OFF-TARGETS) | 1 for Ga20Ox | Model estimated |
T | 297 K | Experiment conditions |
P(complex,Ga20Ox) | 0,9964 | Model estimated |
Main remarks
In order to reduce the random influence of
three-dimensional diffusion of the complex, we
suggested introducing the gRNA and the Testing System
constructions one next to the other. Single mismatches
positioned downstream the 11th-10
th nucleotide of the gRNA, lead to a
positive free energy increment, i.e. they make unable
the R-loop between that gRNA and the DNA target.