Team:Paris Saclay/Model

{{{titre}}}

T--Paris Saclay--090816 Titre.jpg



Model


Introduction

One of the goal of our project is to visualize if the system we designed to bring DNA strands closer works. We decided to use a tripartite GFP for that purpose. As we explained on our home page, this system is composed of several parts:

  • The two dCas9 proteins
  • The two linkers
  • The three parts of the GFP

Before using this system, we needed to answer a very essential question: what is the optimal distance between the two dCas9 proteins for GFP to fluoresce?


Legend


It appeared that the calculating power of a computer would be more than welcome to approach the answer.


3D Model

We first thought of building a simple 3D model knowing our proteins structures. We found these structures on the RCSB Protein Data Bank and used Pymol to assemble the system. For this model we used:

  • Two identical dCas9 proteins from Streptococcus Pyogenese (instead of Streptococcus Thermophiles and Neisseria Meningitidus in our biological system)
  • A wild type GFP where the 10th and 11th beta sheets were removed.

The big limitation of this approach was the lack of information regarding the linker. We could not find any information but its sequence. Hence, we decided using the PEP-FOLD software for building a 3D simulation. The prediction was not usable for our model because the results were very improbable. We decide to build 3 different models: one with a small end to end distance, one with a long distance and one last with the mean between this two values. We had a first answer: the optimal distance lays between 73 and 110 base pairs.


Ideal Chain an Worm-Like Chain Models

We then decided to approach the end-to-end distance of our linker with a mathematical model.

The Ideal Chain model (or freely jointed chain) is one of the simplest model to describe polymer structures. It assumes a polymer as a random walk. For a polymer including N segments sizing l, the contour length is defined as the total unfolded length of the polymer and will size.


T--Paris Saclay--090816 contour length.jpg

R is the total end-to-end vector. It depends on the number of segments and the length of each segments:


T--Paris Saclay--090816 end to end.jpg


The end-to-end distance is then distributed according to this probability density function:


T--Paris Saclay--090816 results1.jpg


The biggest limitation of this model is the fact that we lose information on the spatial arrangement of the repeat units. If we consider our real polymer chain, the rotation of bonds around the backbone is restricted due to hindered internal rotation and due to excluded-volume effects. With this consideration, we know that our results are biased.
br> We decide to continue our mathematical model and we considered the Worm-Like Chain model. This model is suited for describing semi-flexible polymers. We used the paper Huan-Xiang Zhou (2004): Polymer Models of Protein Stability, Folding, and Interactions to have the probability density function:


T--Paris Saclay--090816 Zhou.jpg


With these two models we were able to construct a python program to have the first approximation for the end to end distance of our linker.


T--Paris Saclay--090816 results2.jpg


We see on this graph that we have two different behaviors and we cannot really develop our models because the information were really short: we just know the end to end distance.

So we decide to code our own model for describing the behavior of our linker. The free jointed model and the worm like chain model give us an idea for the results that we waited.


Our mathematical model

T--Paris Saclay--090816 angle.jpg


We decide to make a program that simulates our linker. To do that we decide to consider each segment as a liaison, and all the dihedral angles.

With the Pymol software, we were able to define some constant:

  • The length of each segment : 1.5 Å
  • The angle θ : π/3


In our program, we simulate in 3D our linker. We decide to model the linker by representing each segment by a vector and adding the vectors in the same base. In the end we obtain one vector which represent the end to end distance. We also keep the coordinates to have a 3D representation of our linker. To do that, we initialize the first vector on the Oz axis and then all the others segments will be expressed in that vector-base.

For each segment of our model we have three possibilities:

  • The liaison N-Cα with the Φ angle
  • The liaison Cα-C with the Ψ angle
  • The liaison C-N which is the peptide bond


We have to define a change of basis matrix and consider the first two liaisons. For the peptide bond, we only consider the Rx matrix because Φ = 0.

We define an initial vector:


T--Paris Saclay--090816 init vector.jpg


And we construct the change of basis matrix with a translation matrix Tz, a rotation matrix on the Oz axis Rz, and a rotation matrix on the Ox axis Rx:

  • Tz represent the translation of 1.5 Å corresponding of the segment length
  • Rz represent the Φ or Ψ rotation
  • Rx represent the θ rotation with a fix θ angle. It is define relatively with the Ox axis and that lead to a rotation of the coordinate system on the Ox axis.


T--Paris Saclay--090816 matrix.jpg


Then we have the change of base matrix: P = Tz * Rz * Rx.

With this matrix, we can pass the vector n in the base of the vector n-1.


T--Paris Saclay--090816 rotation.jpg


To each new segment, we define a value for Φ (that depends on the liaison of the amino acid). And we calculate for each Ui segment:


T--Paris Saclay--090816 length U.jpg


In the last step, we conserve the coordinates of each vector for plotting the 3D visualization. We complete our model by adding parts. We consider that our linker is not entirely consisting by glycine and we add a test for adjusting dihedral angles because other amino acids don’t share the same Ramachandran plot. We also define an exclusion zone for each vectors. With this, we exclude non biological covering.


Results

Our program is design to give the end to end distance on n simulations, to give a study of the RMSD and to show one simulation of the linker in 3D.

For 5000 simulations we obtain this graph:


T--Paris Saclay--090816 Gauss.jpg


As we can see the mean of end to end distance is 19.66 and the standard deviation is 7.82. We can compare our results to the other we obtain with the freely jointed chain.

When we study the distance and the repartition of the last segment we obtain:


T--Paris Saclay--090816 repartition.jpg


In the end, we want a 3D representation of our linker and we obtain:


T--Paris Saclay--090816 linker.jpg


With our program we can obtain more information and have an idea of how our linker look on 3D and the space it can fold.


Gromacs software

T--Paris Saclay--090816 end gro.jpg


On this graph, we print the end to end distance to see the dynamic obtained with Gromacs. The results are:

  • Protein Average end to end distance: 2.223 (nm)
  • Average radius of gyration: 0.934 (nm)

This results are really close to our program results. With these results, we can say that our program give good results for the prediction of unfolded protein.

We also obtain the graphs of RMSD and RMSF from Gromacs to have more information about the dynamic of our linker.


T--Paris Saclay--090816 RMSD.jpg


With these information, we can construct the 3D model of our system.


T--Paris Saclay--090816 final.jpg


With Pymol software, we calculate the distance between the dCas9: 244.7 Å


Discussion and Limits

We can compare the results we obtained with our program and the results we obtained with other models. The freely jointed chain is the model that gives the better results for the end to end approximation. The worm like chain model is maybe not suitable for our kind of protein.

The Gromcas results show us that our program give a good estimation of the end to end distance and of the steric hindrance. We can also say that our program is running in less than 5 minutes while Gromacs program run in more than 8 hours.

Our model can be improve because we only see our linker at one time without the all dynamics.

In another way our program give a good estimation of the end to end distance and of the 3D conformation for unfolded protein. We never consider the folding like alpha helix or beta sheet because the results of the disorder of our linker was near to 100% so the program don’t consider these.