Team:Valencia UPV/Model

Overview.

The mechanism of CRISPR/Cas9 relies on binding and unbinding reactions, gene expression regulation, diffusion of biochemical compounds inside the nucleus and even thermodynamics. All these processes will affect the results obtained using the Testing System. The aim of the Testing System is to provide information about the gRNA ability to localize the target.
Therefore, external and intrinsic factors affecting the CRISPR/Cas9 mechanism should be blocked, letting us relate the light measurement exclusively with the gRNA efficacy to find the target. Moreover, this system emulates the performance of the gRNA in the real variety which is desired to be improved. Thus, the fact of implementing the Testing System in Nicotiana benthamiana should not affect the reliability of the result, making them scalable and realistic.
Our goal is to characterize the performance of our Testing System in Nicotiana benthamiana, studying the influence of different factors affecting each stage of the CRIPSR/Cas9 mechanism. We have developed a mathematical model able to represent its performance. This model has been used to simulate different conditions that result in variations of our reporter: Luciferase.
Thus, we can analyze determinant factors affecting Testing System repeatability, reliability and robustness, which are mandatory achievements in order to spread our gRNA Testing System as a standard gRNA efficiency predictor.
Our modeling is structured in the main steps of CRISPR/Cas9:



Cas9:gRNA complex formation.

Overview

Since Cas9 and gRNA constructions are delivered in the host, they will be transcribed (and translated in the case of Cas9) by the cell’s machinery. When they find each other in the plant nucleus, they will interact forming Cas9:gRNA. This complex will perform the knockout in the Testing System construction. Thus, there is a clear relation between the amount of this complex and the light signal produced. Optimizing the complex formation, we will be able to optimize the light measurements.
Studying this step, our aim was to analyze the system stability in function of CRISPR/Cas9 components under different conditions. Using Mass Action Kinetics, Quasi Steady State Analysis and Direct Lyapunov Method, we determined conditions to reach the steady state conditions that provide the optimum amount of Cas9:gRNA complex. Firstly, we analyze the interaction between Cas9 and gRNA, following with a study of agroinfiltration times in the second section.



ODEs

All the process takes place in the nucleus. The involved guide-RNA (gRNA) molecule has two main parts: the scaffold and the spacer. The first one is recognized by the inactive Cas9, resulting in an intermediate complex (Cas9+〖k〗_Cas9 ). After this binding takes place the conformational change of the Cas9 endonuclease, leading to an irreversible isomerization (2). The Cas9:gRNA complex is ready to scan the plant genome searching for the proper target.


The dynamics of these species can be represented by the following reactions:


Cas9 and gRNA find each other in the plant nucleus. They bind each other leading to an Intermediate complex which is unstable, being this reaction governed by the rate 〖k〗_f This union will produce a conformational rearrangement in the Cas9 structure, represented by the constant 〖k〗_I .
Elements from reactions above (Cas9, gRNA and the Intermediate complex) are usually expressed in terms of concentration (Elements from reactions above (Cas9, gRNA and the Intermediate complex) are usually expressed in terms of concentration ([, [gRNA], [complex]). We rename those concentrations using the variables x1=[Cas9], x2= [gRNA], x3= [Cas9+gRNA] intermediate, and x4=[Cas9:gRNA] complex. Then, using mass action kinetics, we can build the system of Ordinary Differential Equations:


This model represents the dynamics of two elements which interact affecting the steady-state condition of each other. Constants ’α’1 and $\beta_1$ reflect a continuous input of x1 and x2. Initially, Cas9 and gRNA constructions get to the cell by viral infection. Thus, if the subject of study was a cell population, the flux of Cas9 and gRNA would vary according to the spread of the viral infection among cells.
However, from an individual perspective, each infected cell produces constitutively Cas9 and gRNA once it has been infected. Thus, production of both elements in a single cell, fits the typical behavior of constitutive production, i.e. reaches the steady-state after certain time. This permanent production is equivalent to the constant input of Cas9 and gRNA in the cell, reflected in terms ’α’1 and $\beta1$, respectively.
Terms ‘α’2B1 &sdot x1B1 &Sdot x2B1$ and &beta2B1 &sdot x1B1 &sdot x2B1 describe the exchange of x1B1 and x2B1 when they interact. Finally, ‘α’3B1 &sdot x1B1 and &beta3B1 &sdot x2B1 represent the disappearance of x1B1 and x2B1 from the system. Steady state conditions of the system will be determined by the stability of the interaction between Cas9 and gRNA. Thus, equations (eq. 1) and (eq. 2) expressing the variation of these elements, must be analyzed in order to characterize the behavior of the system. They fit to the general expression of the following model:


This model represents the dynamics of two elements which interact affecting the steady-state condition of each other. As ’α’ 3 is the degradation rate of Cas9 and it is being compared to the degradation of the gRNA (&beta3), we can assume that ’α’3 is depreciable in relation to &beta 3 , which is the dominant degradation term. Therefore, the model is simplified, resulting in the equilibrium positions:


Using the Direct Lyapunov Method (3), we can get two expressions which can be used to know about the stability of the system given a solution. Further details about the mathematical discussion to demonstrate the use of the Direct Lyapunov Method are available in our Mathematical discussion.
If conditions &sigma <0 and &Delta >0 are provided, it can be assumed that the system will achieve a local stability. Replacing each model parameter (Table 1, (16)) by its value in out particular case, we found that steady-state solutions are:



Finally using those parameters from and solutions we can assume stability conditions of the system (eq. 13) and (eq. 14) are achieved:


Therefore, we can assume that our system is locally stable around the equilibrium point (x ̃_1 x ̃_2).
Table 1: Parameters involved in Cas9 and gRNA dynamics.

Parameter in general expressionParticular ValueNumeric Value (min-1)Source
α_1k_Cas90.000374737(16)
α_2k_f0.00006(16)
α_3δ_Cas90.0000552(16)
β_1k_gRNA0.0025284(16)
β_2k_f0.00006(16)
β_3δ_gRNA0.000252(16)
K’Cas9k_Cas9·c_(copy number Cas9)Estimated




Simulations.


Cas9 and gRNA interaction.


Once we analytically determined that our solution was feasible and robust, we simulated in Matlab the same conditions in order to check if the steady-state values obtained analytically were the expected ones. As is can be seen in graphics below, the equilibrium values predicted analytically are very close to those obtained simulating in Matlab.

Figure 1. Simulation of Cas9 evolution using parameters from Table 1 and one gene copy number.



Figure 2. Simulation of

Figure 2. Simulation of [gRNA">

Figure 2. Simulation of [ evolution using parameters from Table 1 and one gene copy number.



There is a small difference between analytical and experimental results. This is probably due to the difference in the theoretical value of &α 3 , which is presumably zero, and the value introduced in simulations, which is not null (value available in Table 1).
Simulations performed let us check the equilibrium values obtained analytically. Analyzing Figure 1, it is observed that Cas9 endonuclease is produced reaching its maximum at the time of peak t_p = 4877 minutes, i.e. 4 days approx. From day 4, there is enough of both compounds in order to form the Cas9:gRNA complex, explaining the decay in Cas9 after 4 days from infiltration. Four days later, at t≈ 10 days (15000 minutes), Cas9 reaches its steady-state near the 0.7 nM predicted analytically. This behavior is similar to a Second Order System response after a step input. It reaches the equilibrium value with a time slope of t_s = 2380 minutes, i.e. 1 day and 15 hours.
Regarding the gRNA (Figure 2), it is produced without any oscillation. The maximum production (8.6 nM) is reached between around the 13th day, and the 63\% of this value at τ≈ 3400 minutes. As gRNA has a dynamic considerably faster than Cas9, it keeps growing until the endonuclease is produced. As soon as this happens, Cas9 and gRNA are used to form the Cas9:gRNA complex. Thus, the peak in Figure 1 makes reference to this complex formation. In Figure 2 this consumption is not appreciated because there is plenty more gRNA disposable than gRNA used to form the complex with Cas9 produced.
However, using one gene copy number is not a current laboratory condition. As production of Cas9 and gRNA is directly related to the number of gene copies, higher amounts of them are a more realistic subject of study. Since terms 〖K’〗_Cas9 and K_gRNA°’ are a function of the number of copies, changing this parameter the equilibrium values will be affected. The next graphics represent different results varying the gene copy number only for Cas9, only for gRNA and finally for both of them.
In the following simulations the gene copy number of the gRNA was increased gradually from 1 to 45. Results obtained each 5 copies were plotted to analyze the stability of Cas9 when the gRNA is produced faster and with higher quantities.

Figure 3. Simulations of

Figure 3. Simulations of [Cas9">

Figure 3. Simulations of [ evolution using parameters form Table 1.


From the simulations above, it is clear that Cas9 efficacy is highly sensitive to the presence of gRNA in the nucleus. The biggest difference among concentrations of Cas9 in the graphic above, is produce between the result with 1 copy and the result with 5. This means that even though Cas9 production is not increased, the endonuclease will be able to find the gRNA and form complexes with it.
There is an indirect relation between the maximum Cas9 presence, the t_p at which this maximum is reached, and the amount of gRNA. It seems quite logical, as the more gRNA present, the more likely it will be that all Cas9 produced is consumed to form the complex with those gRNAs.
From the picture below, it can be confirmed that an increment in the Cas9 gene copy number, affects homogeneously to the quantity of gRNA. Qualitatively is has the same behavior as in the scenario with one copy. The only difference is reflected in a gain of the steady-state gRNA value, which increases in 5-fold from the previous simulation with 5 gene copies less.

Figure 4. Simulations of

Figure 4. Simulations of [gRNA">

Figure 4. Simulations of [ evolution using parameters from Table 1.


When the gene copy number for Cas9 is increased, the production of the endonuclease behaves similarly to the gRNA in the picture above. However, the slope is considerably less. This difference in the dynamics was expected, as it is known that RNA reactions are several orders faster than protein ones.

Figure 5. Simulations of

Figure 5. Simulations of [Cas9">

Figure 5. Simulations of [ evolution using parameters from Table 1.


There is no decrease in the Cas9 evolution that let us know when does the interaction with the gRNA begin. However, the beginning of this reaction is well appreciated in the graphic with the gRNA evolution.

Figure 6. Simulations of

Figure 6. Simulations of [gRNA">

Figure 6. Simulations of [ using parameters from Table 1.


As it is mentioned above, the initial slope of gRNA is considerably faster than Cas9’s, as gRNA only means transcription but Cas9 must be translated as well, delaying its growing. This behavior is comparable to Cas9 variations when the gRNA copy number was increased. However, the difference between simulations with one copy number and 5 copies, are less remarkable in this case, reaching the half of the one-copy steady state value in this case. Again, this difference is smaller due to the fast production of gRNA. The time of peak changes as well among simulations with different number of Cas9 gene copies, being the maximum at approx. 5000 minutes, i.e. 3 and a half days. The minimum time peak is produced near 2 days after production.
Simulations varying simultaneously both gene copy numbers where performed. This time was calculated the amount of intermediate complex produced, which is shown in the following graphic.

Figure 7. Simulations of

Figure 7. Simulations of [Cas9:gRNA">

Figure 7. Simulations of [ intermediate complex production with different number of copies for Cas9 and gRNA.


From this results it can be inferred that the amount of complex being produced is very sensible to the amount of Cas9 in the nucleus. This is reflected in the higher values of complex when Cas9 gene copy number has reached only the value of 5. Necessary amounts of gRNA to increase the intermediate presence, are higher than those needed for Cas9. This means that gRNA will be the limiting reagent of this process (14). Though this may refute the fact that gRNA is produced faster by cell’s machinery, its fast degradation explains why this element is limiting the Cas9:gRNA complex ready to perform the knockout.
Therefore, less amount of Cas9 will not be as determining as less amount of gRNA. It will be critical to provide the system with the necessary gRNA in order to let enough complex be formed. Otherwise, less knockouts would be performed in the Testing System and light signal recorded during Luciferase assays would be less significant.

Agroinfiltration times.


Experimentally Cas9 and gRNA can be infiltrated at the same or at different times in the plant. The difference between Cas9 and gRNA dynamics is known and it should be considered in order to improve the performance of our Testing System.
Next scenarios will assume that Cas9 has been introduced few days earlier than gRNA, letting time enough to be produced until reaching a certain concentration. Simulations now considered an existing amount of Cas9 when the gRNA is still at point 0.

Figure 8. Simulations of

Figure 8. Simulations of [Cas9:gRNA">

Figure 8. Simulations of [ intermediate complex obtained with different conditions relying initial amount of Cas9 available at the host nucleus.


As expected, infecting plants previously with Cas9 to let it be produced, improves considerably the amount of complex obtained. Moreover, the t_p does not change drastically in comparison with null As expected, infecting plants previously with Cas9 to let it be produced, improves considerably the amount of complex obtained. Moreover, the t_p does not change drastically in comparison with null [ initial conditions, being around 1000 minutes. The steady state is achieved as well without compromising the stability of the system. Therefore, greater amounts of the complex will be produced in less time, or in other words, the efficiency of the system will be increased if the Cas9 is already in the cell when the gRNA is infiltrated.



In graphics above it can be appreciated how the system becomes more robust as more Cas9 is available in the cell nucleus. The advantage of reducing the importance of the gRNA presence, is that the amount of light signal will not be perturbed by a low gRNA concentration. Thus, the Testing System output will vary according to the efficiency of the gRNA to find the target, and not relying on other external factors such as the cell’s efficiency in the gRNA transcription.

Main remarks

The usual lab protocol considers the simultaneous agroinfiltration of the gRNA and the Cas9. However, in simulations we observed that the production of the complex could be maximized if Cas9 was agroinfiltrated in first place.

R-loop formation.


Overview.


After the formation of the Cas9:gRNA complex, it must find the target and knock it out. As soon as the complex is formed, it will wander around the nucleus describing a random pathway. During this erratic trajectory, collisions with several regions of DNA will take place. If the union to those regions is thermodynamically balanced and feasible, i.e. regions are complementary enough to the gRNA sequence, the R-loop will take place. This structure results on the hybridization of the Cas9:gRNA complex to a DNA sequence. This implies not only joining the target, but also binding undesired though similar regions, named off-targets.


In this step of the modeling, we used Boltzmann probability distribution and Thermodynamics in order to estimate the probability that the Cas9:gRNA complex finds the target. We also developed an off-target search algorithm based on the transcriptional activity and target-similarity provided by local alignment algorithms.



Complex diffusion.


The process of searching the target among all the genome is named scanning. Thus, we can express the contact rate between Cas9:gRNA complex and any other DNA region:
k_r=(6·D·λ_Cas9·k_r=(6·D·λ_Cas9·[)/V= 0.0172·[Cas9:gRNA]
Where parameters with values indicated in Table 5, are:
D is the compound diffusivity.
[ is the concentration of the complex.
V is the compartment volume, i.e. the plant nuclear volume.
λ_Cas9 is the characteristic length between the place of production and binding.
Several assumptions were made to consider random three-dimensional diffusion of the complex. Those assumptions were:
The complex can be considered as a macromolecule with three-dimensional random diffusion around the nucleus.
The net molar flow is presumably equal to zero, as the compartment composition is well-mixed.
As DNA is dispersed among the nucleus, the complex will be almost in permanent contact with it.


Thus, assuming a spherical approach of the Cas9 spatial shape, λ=∛(V_Cas9 ), because in the edge of Cas9 there will be DNA ready to hybridize gRNA if possible (2).
Varying the time of measurement and the number of Cas9 and gRNA copies introduced, we can expect different results for this ratio:

Figure 9. Comparison between values obtained for the diffusion ratio of the Cas9:gRNA complex al time t = 4320 minutes (3 days). All curves are relative to the results obtained with 1 copy for Cas9.


Simulations represented in the graphic above let us check that the gain in the rate of contact between the Cas9:gRNA complex, increases approximately in the same way when so does the concentration of the complex. A minimum of 10-15 gene copy number for the gRNA construction is necessary to achieve the plateau of the k_r ratio, independently of the gene copy number for the Cas9 construction. Furthermore, the gain in the number of gene copies encoding Cas9, is the same produced in the random contact ratio.
As it can be inferred from this analysis, achieving enough Cas9:gRNA concentration is critical to stablish contact between the complex and the target. One possibility increase repeatability of the test, minimizing the randomness of the complex diffusion, is to infiltrate the Testing System construction with the gRNA sequence. The expected result was that if the gRNA is transcribed near the target, the aleatory of the three-dimensional diffusion would be minimized. Joining both pieces near to each other, it will be “easier” for the complex to find the target. This suggestion was implemented in wet-lab experiments, showing an increase of the light signal.

Probability of R-loop formation.


In order to knock out our genetic target, it must be hybridized by the gRNA forming the R-loop. This structure provides Cas9 with the necessary stability to cut the DNA strand (2, 15). In order to get the structure, it is necessary that potential targets are complementary enough to the gRNA. Providing gRNA-DNA complementarity means accomplishing the thermodynamic requirements to let the knockout happen.


Estimating the number of targets and off-targets, we can obtain a distribution of the cleavage probability in function of energy needed to cleave each DNA location. Thus, M different energetic states are considered as places where the R-loop could take place, being our Testing System one of that states. This scenario can fit to a Boltzmann distribution (2,5), being the binding probability of the Cas9:gRNA complex with the m-DNA region:
P_(complex,target)=(N_(target,j)/N exp⁡(-ΔG_(complex,target) ))/(1+Σ_m°M 〖 N〗_(target,m)/N exp⁡(-(ΔG_(complex,target))/(k_B T)) )
This expression has information about the thermodynamic balance after the R-loop formation. In order to obtain it, we had to obtain previously the free energy increment for each DNA candidate (-ΔG_(complex,target)), and the expected number of those regions (〖 N〗_target).
Off-target regions were estimated using the off-target search algorithm, getting 1 off-target for Ga20Ox and 5 for TFL. The next section has the explanation of the free energy increment, which ensures the thermodynamic stability of the R-loop.

Off-target search algorithm.

Off-targets are DNA regions where the R-loop could take place because of the high similarity between that region and the target. This means that off targets steal Cas9 and gRNA supposed to knock out on-targets. Most reliable off-target predictions are obtained by experimental results, but in our model we must be able to find off-targets quickly for all possible targets (1,2) so we could not wait for experimental results.
Alternatively, we have developed an off-target search based in transcriptional activities and local alignments between target and off-target candidates. Our proposed strategy was the following one:


The first step was to create a gene library with sequences of the most transcribed genes in Nicotiana benthamiana. There is a clear relation between transcriptional activity and relaxed state of chromatin (13), letting us assume that those genes highly transcribed will be more accessible to the gRNA.
This algorithm is implemented in the Matlab function Nboffsearch.m. The search of potential off-targets for each of our two targets, gave a result of 1 off-target for Ga20Ox and 5 off-targets for the TFL.

Free energy increment ∆G_(complex,target).

As there is no energy supply catalyzing the R-loop (2), the process described above is a sequence of reactions which in global, must accomplish the thermodynamic law for this kind of processes:
∆G_(complex,target)≤0
Where the free energy is decomposed in the main stages previously described (2,15):
ΔG_(complex,target)=ΔG_PAM+Δ〖ΔG〗_(exchage RNA:DNA)+Δ〖ΔG〗_supercoiling
Each one of the terms from the expression above, is explained in following sections of the Modeling. We determined these parameters for two of the targets contained in our Database: ORYZA SATIVA JAPONICA GROUP GIBBERELLIN 20 OXIDASE 2 (LOC4325003) and CITRUS SINENSIS TERMINAL FLOWER (TFL). Those induce higher grain yield in rice flowering in orange, respectively. Values obtained for each of them were ΔG_(complex,g20oxidase)= -10,37 kcal/mol and ΔG_(complex,TFL)=-11,78 kcal/mol.
These values were used to obtain the probability that the Cas9:gRNA complex cleaves the target in the Testing System, performing the desired knockout. To calculate this probability, we had to account for possible off-targets, as they could obtain ΔG_(complex,target)<0, letting the R-loop be formed at those regions as well.
After calculating the ΔG_(complex,offtarget)for off-targets suggested by the algorithm, only the off-target for the Ga20Ox could be considered as a final off-target, as TFL off-targets got free energy increments higher than zero. Therefore, we calculated the P_(complex,Ga20Ox), obtaining a value of 0,9964.

PAM binding energy..

Once the complex has been formed, it must find the PAM sequence of the target included in the Testing System. The PAM region has between 3 and 5 nucleotides which are recognized by Cas9, enabling the union of the complex to the DNA. Each Cas9 specie binds better to one type of PAM region. In our case, as we are working with the human type Cas9, the PAM sequence will be -NGGN-.
Bearing in mind that breaking one hydrogen bond provides at least an energy supply of 1.2 kcal/mol, the binding of the PAM sequence must give a free energy (ΔG_PAM) at least of 9 kcal/mol:


Relying on the nucleotides arrangement, ΔG_PAM resulting from the PAM recognition will vary. However, it is possible that Cas9 interacts with a PAM region even though there are mismatches between them (2). In order to know if this affinity for regions with mismatches was significant, we studied the ΔG_PAM obtained for all possible PAM combinations.


White spaces in picture above represent PAM alternatives which do not bind significantly to a -NGG- PAM. The lower binding energy is clearly for PAMs with the structure -NGG-, achieving less than -9kcal/mol. However, there are other alternatives with affinity enough to let the Cas9 bind to them. The potential of these regions as possible off-targets relies also on the Cas9 specie being used, as there are some more “promiscuous” than others. We opted for including this potential off-target PAMs in our off-target search algorithm.
This PAM-energy assignment is implemented in the Matlab function energy_PAM.mat. The input is the target sequence. Comparing the PAM extreme nucleotides of the input sequence with a table containing Cas9-PAM binding energies, it matches the information of the string input with the corresponding energy ΔG_PAM. In the particular case of targets implemented in our testing system, the value of this parameter was:
Table 2: Results of ΔG_PAM for targets implemented in Testing System

TargetPAMΔG_PAM
Rice - Oryza sativa, Semi-dwarf; higher grain yield. CGG-9,600
Orange - Citrus sinensis, Induced flowering.
TGG-9,700

Cas9:gRNA:DNA hybridization.

Secondly, the release of the energy from PAM binding will be used to hybridize the gRNA and nucleotides from the DNA sequence. In order to know the energy associated to different base pairs, we built a table with the energy increment for all possible matching duplexes. This table provides all information to calculate the Δ〖ΔG〗_(exchange,gRNA:target) (8,9,10,11), as there should not be mismatches between both of them. In the graphic below it can be appreciated that there are two sources of variability affecting the energetic balance of duplex hybridization. On one hand, there is a clear dependence of the nucleotide, and on the other hand, the type of nucleic acid (RNA or DNA) also affects the free energy increment.


Thus, we had to choose an approach which considered the energy differences between different nucleotides and different nucleic acids. Moreover, the model used to estimate the energetic cost of forming the R-loop, had to consider the distance of base pairs to the PAM region.
This kind of forward move which considers all the mentioned criteria, is known as nearest neighbor model. The meaning of this model is that RNA:DNA union will rely on the context. The energy used to bind a pair of duplexes, uses the energy released by the previous duplex union. In other words, it is assumed that the energy from the kth hydrogen bond will be used in the reaction of the nearest base pair, k+1.
Thus, the hybridization of several nucleotides can be represented as a sequence of binding reactions, leading to a global difference between the hydrolysis of DNA:DNA bounds, and the union of gRNA:DNA strands. The term ∆〖ΔG〗_(exchange gRNA:target) reflects the difference between the free energy used for the pair DNA-DNA hydrolysis and the pair RNA-DNA hybridized.
∆ΔG_(exchange,gRNA:target)=Σ_k d_k ∆ΔG_(exchange,gRNA:target)=Σ_k d_k [
However, the gRNA may have thermodynamically stable unions with other regions which are not the target. Those DNA regions, named off-targets, may have only few mismatches that slightly affect its affinity towards the gRNA. The half part of the gRNA close to the PAM, is the most determining to form the R-loop. Therefore, if mismatches are placed in the extreme opposite to the PAM, they may will not compromise the off-target knockout.
In a similar way as we did with matching duplexes, our first try was to find more information about energy accounted when mismatches are produced. Nevertheless, there is poor consensus among bibliography, not all possible duplexes have been studied and we neither could determine these energetic values empirically.
In order to solve this, we created a penalty vector which adds a penalty to the match binding energy for each mismatch. The term d_k refers to a weight that decreases as k is increased, with k=1,2,3…length gRNA (typically 20-23). We have estimated values of those weights using criteria from our Scoring System, as it had been validated comparing to other target searchers available online. Those coefficients are multiplied by the single mismatch average penalty of 0.78 kcal/mol, extracted from bibliography (2). The implementation of the penalties is in the Matlab function weights_exchange.m, and the vector of penalties is represented below



Using this strategy, we studied how would could vary the ΔΔG_(exchange,gRNA:target) estimated for a target with different number and positions of mismatches. We implemented the obtention of ΔΔG_(exchange,gRNA:target) in the Matlab function energy_exchange.m. Results obtained for our particular targets were:
Table 3: Results of ΔG_(exchange,gRNA:target) for targets implemented in Testing System.

TargetΔΔG_(exchange,gRNA:target)
Rice - Oryza sativa, Semi-dwarf; higher grain yield. -0.7710
Orange - Citrus sinensis, Induced flowering.
-2.0785

In order to know more about how do mismatches affect the thermodynamic balance of the gRNA and DNA target hybridization, we calculated values of ΔΔG_(exchange,gRNA:target)for rice and orange, varying the position of a single mismatch. Conditions of the simulations are in Table 4.
Table 4: Results of ΔG_(exchange,gRNA:target) for targets implemented in Testing System.

Original nucleotideNew nucleotidepositionΔΔG_(exchange,gRNA:target)
Rice - Oryza sativa, Semi-dwarf; higher grain yield.
GA23-0,563
GT23-0,603
GC23-0,703
GA100,369
GT100,169
GC10-0,381
CA74,989
CG72,389
CT74,589
Orange - Citrus sinensis, Induced flowering
GA23-1,8805
GT23-1,8905
GC23-1,9705
TA12-0,8885
TG12-0,3885
TC12-1,3385
CA42,6815
CG4-0,3185
CT41,6815


The results illustrated in the graphic above, show that as expected, the R-loop formation is less likely as it is reduced the distance between a mismatch and the PAM. In general, it seems that our thermodynamics approach emulates well the mechanism of a R-loop formation. With both targets, the average result of changing one nucleotide, is decreased with higher PAM-distance. Mismatches placed downstream the 20 th nucleotide, typically result in positive free energy increments, avoiding the RNA:DNA hybridization. This agrees with criteria found in bibliography (2), letting us assume that the penalty system worked well as representation of mismatch effects.
The variability observed in each position is due to the differences between three possible nucleotides. However, there are some atypical results as well which may be caused by unknown sources of variability. For instance, the &Delta &Delta G_( exchange,gRNA:target ) is overlapped for positions 23 and 10 in rice, while energy difference for the position 23 was supposed to be minor. This could be due to the necessity of training and improving our function, using parameters in the penalty function which are based on empirical evidence.

DNA supercoiling.

Finally, the chromatin state is critical to let the gRNA hybridize the DNA, and some energy can be extra-needed if the DNA is "relaxed", i.e. positively supercoiled. Consequently, regions highly similar may will not be able to join the gRNA because the chromatin could be compressed. Having off-targets means that the binding will be taking place in some unspecified regions. The difference between the initial density of a target (〖&sigma 〗_I ) and a non-specific region (〖&sigma 〗_NS ) will affect to the difference in the free energy needed to untwist the on-target DNA region:
ΔΔG_(supercoiling,target)=-10·n·k_B·T·(σ_F°2-σ_I°2)
As we could not determine the chromatin state and its evolution during the time that CRISPR/Cas9 was working on the plant, we could not study this parameter.
Nevertheless, information about the chromatin supercoiling has been indirectly introduced in our model. High transcription activities can be synonym of relaxed chromatin (12, 13). Thus, we can assume that supercoiling will not be affecting to our Testing System, since it has the 35S promotor, which is one a constitutive promotor with high activity (4). Moreover, the difference in DNA supercoiling between the target and off-targets, can be considered nearly zero because potential off-targets will be those genes of Nicotiana benthamiana which are highly transcribed.

Parameters.


Table 5: Parameters used calculation of the k_(cleavage TS)parameter.

ParameterValueSource
D2700 µm2/minReference (6)
λ0,015 µmReference (7)
V14140 µm3Waterloo iGEM team 2015
[(t)t = 3 daysModel estimated.
ΔΔG_(single mismatch penalty) 0.078 kcal/molReference (2)
kr0.0172·kr0.0172·[Model estimated
kc0,48 min-1Reference (2)
kunbind300 min-1Reference (2)
kB0.0019872041 kcal/(mol·K)Reference (5)
N(OFF-TARGETS)1 for Ga20OxModel estimated
T297 KExperiment conditions
P(complex,Ga20Ox)0,9964Model estimated


Main remarks

In order to reduce the random influence of three-dimensional diffusion of the complex, we suggested introducing the gRNA and the Testing System constructions one next to the other. Single mismatches positioned downstream the 11th-10 th nucleotide of the gRNA, lead to a positive free energy increment, i.e. they make unable the R-loop between that gRNA and the DNA target.

Cleavage and Reading Frame Shift.


Overview.

If the R-loop is completely established, then the complex Cas9:gRNA stays bounded to the DNA strand and Cas9 will cut the DNA sequence. This will lead in the loss of some base pairs that the plant cell will repair Via Non Homologous End Joining. This may cause a change in the reading frame (TS_( OFF )→ TS_( ON )), and consequently in the proteins produced by the cell. Taking profit from this shift in the DNA reading frame, we included Luciferase after the genetic target.


Therefore, light signal emitted during Luciferase assays, will provide an output information about the whole process performed by CRISPR/Cas9 within the plant cells. As it relies on the target finding by the Cas9:gRNA complex, it is also a measure of the gRNA-target efficiency in order to knockout the undesired gene.



Knockouts estimation


While the Cas9:gRNA complex is produced, it produces knockouts among the host genome. In our case, these knockouts are expected to improve the desired plant trait. In this section of the modeling, we wanted to study in silico if the suggested pair target-gRNA will work, obtaining a predicted number of knockouts performed in the Testing System construction for the time at which the Luciferase assay is performed.
The number of knockouts produced at measurement time 〖t 〗_m will be:
Testing Syste〖m 〗_(knockout ) (t)=k_( cleavage TS )·N_( ON-TARGETS Testing Syste〖m 〗_(knockout ) (t)=k_( cleavage TS )·N_( ON-TARGETS )·[(t)
Where:
The concentration The concentration [(t) is the amount of complex produced at the time when the luciferase assay is performed. This can be estimated using the system of ODEs described in the previous section Cas9:gRNA complex formation.
The term N_( ON-TARGETS ) makes reference to the quantity of on-targets in the nucleus, i.e. the gene copy number of the Testing System construction which has been agroinfiltrated.
The rate k_cleavage has information about the cleavage frequency in the Testing System construction. It has information about the complex diffusion, the rate of Cas9 cleavage, and the thermodynamic stability of the R-loop formed before the cleavage. It is:
k_(cleavage TS)=k_r· k_c/(k_c+k_unb ) · P_(complex,target)
The rates in last expression mean:
The rate of collisions k_r, produced between the Cas9:gRNA complex and the DNA in the nucleus.
The rate of cleavage k_cat which Cas9 cuts DNA strands.
The occurrence of the Cas9:gRNA dissociation from a DNA region.
The probability that the R-loop is formed between the complex and the target used in the Testing System construction. This parameter has also information about potential off targets.
The values for parameters k_r and P_(complex,target) are related to diffusion and thermodynamics, respectively.

ORFs probability distribution.

From the moment that Cas9:gRNA complex is produced, it starts diffusing, colliding and binding to on-targets (ONs) and off-targets (OFFs). Those have known initial concentrations, since the number of on-targets is the gene copy number of the Testing System construction, and the number of off-targets has been obtained by our off-target search algorithm.
After the complex interacts with a target, it changes it sequence. The cut performed by Cas9 takes place in the PAM-proximal region, avoiding the possibility of a new zip-union with the gRNA to form the R-loop again. Therefore, knockouts cannot be represented as off targets or on targets anymore, as binding to the gRNA will not be thermodynamically feasible because of base pairs rearrangement.
In our Testing System strategy, we rely on the reading frame shift to let Luciferase be transcribed. However, this event will not always be likely to happen. There are three possible reading frames and the resulting one after the Cas9 cut, is unknown.



This created the necessity of estimating the probability that the resulting frame after the Cas9 cleavage and NHEJ reparation, was the correct for Luciferase transcription. Consulting bibliography about frequency of indels in Nicotiana benthamiana, we could gather information about this phenomenon. Results showed barely any difference between the three possible results (ORF+1, ORF+2 and ORF+3). Therefore, the probability that our Testing System would express Luciferase was P_ON≈1/3.

Estimation of probability associated to each one of the possible resulting reading frames after Cas9 knockout.



Main remarks

Analysis of bibliographic data determined that the probability that our Testing System would express Luciferase was P_ON≈1/3. Further experimental results in the future will allow to fully validate the model.  

CONCLUSIONS


In this Modeling task, we have developed a mathematical approach of the CRISPR/Cas9 mechanism and its performance in our Testing System. Our model has a set of ODEs representing Cas9:gRNA complex formation, a thermodynamic balance for the R-loop formation, a sequence-alignment algorithm to search off-targets and probabilities estimation to know the reading frame shift after the Cas9 cleavage.
With this in silico model, we have been able to study the behavior of the system under different conditions. For instance, proposals based on the modeling were implemented in wet lab, with some successful results. In particular we determined optimum times of agroinfiltration, and redefined lab protocols for genetic constructions design, as we described below..
Regarding the Cas9:gRNA complex formation, it has been observed in simulations that the fast dynamics of the gRNA causes its fast disappearance. In other words, the gRNA is the limiting reagent. The usual lab protocol considers the simultaneous agroinfiltration of the gRNA and the Cas9. However, in simulations we observed that the production of the complex could be maximized if Cas9 was agroinfiltrated in first place. This is because after 3 days approx. there will be enough Cas9 in the nucleus to sense small quantities of gRNA. This alternative protocol would make the system less susceptible to lower efficiency in the transcription mechanisms of the cell.
The three-dimension diffusion of the complex until it finds the target, describes an aleatory pathway. This implies more variability within the results of the Testing System, making it less robust and reducing its repeatability. In order to reduce this negative impact, we suggested introducing the gRNA and the Testing System constructions one next to the other. Thus, the probability that the complex “randomly” finds the target is increased and there is more control of intrinsic variability sources affecting the result. Wet lab experiments performed under these conditions showed favorable results when the gRNA was agroinfiltrated near to the target.
The analysis of the thermodynamic balance which regulates the R-loop formation, let us estimate that single mismatches positioned downstream the 11th-10 th nucleotide of the gRNA, lead to a positive free energy increment, i.e. they make unable the R-loop between that gRNA and the DNA target. This verifies bibliographic criteria about efficient design of gRNAs.
Another critical point is the chromatin state, which cannot be determined instantaneously nor continuously. We have tried to avoid this lack of information relating the DNA packaging with transcriptional activity. More information about this factor is necessary, letting us further study its relation with the CRISPR/Cas9 efficacy and efficiency.
In conclusion, though lack of data of our testing system under the simulated conditions has been an obstacle for the model validation, the model gives sensible semi-quantitative results. The large amount of time required when working with plants, has limited the quickly generation of data and test of hypothesis extracted from modeling results. Further experimental results in the future will allow to fully validate the model. Nevertheless, results obtained have helped us to determine critical steps of the process, as well as to redesign laboratory protocols.
REFERENCES
Bae S, Park J, Kim J. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics. 2014;30(10):1473-1475.
Farasat ISalis H. A Biophysical Model of CRISPR/Cas9 Activity for Rational Design of Genome Editing and Gene Regulation. PLOS Computational Biology. 2016;12(1):e1004724.
Chattopadhyay J, Tapaswi P, Mukherjee D. Formation of a regular dissipative structure: a bifurcation and non-linear analysis. Biosystems. 1992;26(4):211-222.
Nakabayashi JSasaki A. A mathematical model of the intracellular replication and within host evolution of hepatitis type B virus: Understanding the long time course of chronic hepatitis. Journal of Theoretical Biology. 2011;269(1):318-329.
Kalinin MKononogov S. Boltzmann’s Constant, the Energy Meaning of Temperature, and Thermodynamic Irreversibility. Measurement Techniques. 2005;48(7):632-636.
Dill K, Ghosh K, Schmit J. Physical limits of cells and proteomes. Proceedings of the National Academy of Sciences. 2011;108(44):17876-17882.
Nishimasu H, Ran F, Hsu P, Konermann S, Shehata S, Dohmae N et al. Crystal Structure of Cas9 in Complex with Guide RNA and Target DNA. Cell. 2014;156(5):935-949.
SantaLucia, J, Allawi H, Seneviratne P. Improved Nearest-Neighbor Parameters for Predicting DNA Duplex Stability †. Biochemistry. 1996;35(11):3555-3562.
Watkins N, Kennelly W, Tsay M, Tuin A, Swenson L, Lee H et al. Thermodynamic contributions of single internal rA·dA, rC·dC, rG·dG and rU· dT mismatches in RNA/DNA duplexes. Nucleic Acids Research. 2010;39(5):1894-1902.
Sugimoto N, Nakano M, Nakano S. Thermodynamics−Structure Relationship of Single Mismatches in RNA/DNA Duplexes †. Biochemistry. 2000;39(37):11270-11281.
Sugimoto N, Nakano S, Katoh M, Matsumura A, Nakamuta H, Ohmichi T et al. Thermodynamic Parameters To Predict Stability of RNA/DNA Hybrid Duplexes. Biochemistry. 1995;34(35):11211-11216.
Deng S, Stein R, Higgins N. Organization of supercoil domains and their reorganization by transcription. Molecular Microbiology. 2005;57(6):1511-1521.
Adrian J, Farrona S, Reimer J, Albani M, Coupland G, Turck F. cis-Regulatory Elements and Chromatin State Coordinately Control Temporal and Spatial Expression of FLOWERING LOCUS T in Arabidopsis. THE PLANT CELL ONLINE. 2010;22(5):1425-1440.
Ma X, Zhang Q, Zhu Q, Liu W, Chen Y, Qiu R et al. A Robust CRISPR/Cas9 System for Convenient, High-Efficiency Multiplex Genome Editing in Monocot and Dicot Plants. Molecular Plant. 2015;8(8):1274-1284.
Rutkauskas M, Sinkunas T, Songailiene I, Tikhomirova M, Siksnys V, Seidel R. Directional R-Loop Formation by the CRISPR-Cas Surveillance Complex Cascade Provides Efficient Off-Target Site Rejection. Cell Reports. 2015;10(9):1534-1543.
Moore R, Spinhirne A, Lai M, Preisser S, Li Y, Kang T et al. CRISPR-based self-cleaving mechanism for controllable gene delivery in human cells. Nucleic Acids Research. 2014;43(2):1297-1303.


Sponsors