(46 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
{{Team:Paris_Saclay/project_header|titre=Model}} | {{Team:Paris_Saclay/project_header|titre=Model}} | ||
− | + | <html><style>header{background-image: url("https://static.igem.org/mediawiki/2016/c/cb/T--Paris_Saclay--090816_Titre.jpg");}</style></html> | |
− | + | ||
=Introduction= | =Introduction= | ||
− | One of the goals of our project is to visualize if the system we designed to bring DNA strands closer works. We decided to use a tripartite GFP for that purpose. As we explained on our [[Team:Paris_Saclay/Strategy#strategy|Strategy page]], this system is composed of several parts: | + | One of the goals of our project is to visualize if the system we designed to bring DNA strands closer works. We decided to use a tripartite GFP for that purpose. As we explained on our [[Team:Paris_Saclay/Strategy#strategy|Strategy page]], this system is composed of several parts '''[fig. 1]''': |
*The two dCas9 proteins | *The two dCas9 proteins | ||
*The two linkers | *The two linkers | ||
Line 10: | Line 9: | ||
[[File:T--Paris_Saclay--090816_intro.jpg|400px|center|Legend]] | [[File:T--Paris_Saclay--090816_intro.jpg|400px|center|Legend]] | ||
+ | <center>'''Figure 1''' : Tripartite split-GFP visualization tool</center> | ||
+ | |||
To visualize the fluorescence, the tripartite GFP needs to assemble. The 10th and 11th β-sheets of the GFP will be linked to the dCas9 and the GFP’s β-sheets from 1 to 9 will be free in the bacteria. We wanted to design our system to be sure that the GFP tri-partite assembles. To do that, we needed to find the optimal distance between the two target sequences that results in fluorescence. | To visualize the fluorescence, the tripartite GFP needs to assemble. The 10th and 11th β-sheets of the GFP will be linked to the dCas9 and the GFP’s β-sheets from 1 to 9 will be free in the bacteria. We wanted to design our system to be sure that the GFP tri-partite assembles. To do that, we needed to find the optimal distance between the two target sequences that results in fluorescence. | ||
Line 15: | Line 16: | ||
=Constraints and Limits= | =Constraints and Limits= | ||
− | The first constraint for the distance is the following: if the distance between the two dCas9 is too far, the tri-partite GFP may never assemble, as illustrated below. | + | The first constraint for the distance is the following: if the distance between the two dCas9 is too far, the tri-partite GFP may never assemble, as illustrated below '''[Fig. 2]'''. |
[[File:T--Paris Saclay--100916 Limite longue.png|700px|center|Legend]] | [[File:T--Paris Saclay--100916 Limite longue.png|700px|center|Legend]] | ||
+ | <center>'''Figure 2''': Prediction of the system if the distance is too long '''</center> | ||
− | On the other hand, when the end-to-end distance of the linker is too short, steric hindrance impedes upon the assembling of the GFP-tripartite. The two dCas9 proteins are not in the same plan and the DNA must curve. This affects the distance between the target sequences because curved DNA is longer than straight DNA. | + | |
+ | On the other hand, when the end-to-end distance of the linker is too short, steric hindrance impedes upon the assembling of the GFP-tripartite. The two dCas9 proteins are not in the same plan and the DNA must curve. This affects the distance between the target sequences because curved DNA is longer than straight DNA'''[Fig.3]'''. | ||
[[File:T--Paris Saclay--100916 Limite court.png|600px|center|Legend]] | [[File:T--Paris Saclay--100916 Limite court.png|600px|center|Legend]] | ||
+ | <center>'''Figure 3''': Prediction of the system caused by the steric hindrance'''</center> | ||
+ | |||
Before using this system, we needed to answer a very essential question: what is the optimal distance between the two dCas9 proteins required for GFP to fluoresce? | Before using this system, we needed to answer a very essential question: what is the optimal distance between the two dCas9 proteins required for GFP to fluoresce? | ||
[[File:T--Paris Saclay--100916 dCas9.png|400px|left|Legend]]<br><br><br> | [[File:T--Paris Saclay--100916 dCas9.png|400px|left|Legend]]<br><br><br> | ||
− | In our model, we can liken the distance between the target sequences to the distance of the two dCas9 proteins, since each sequence is lodged in the core of each dCas9 protein. This allows us to calculate the distance between the two dCas9 proteins based on the distance between the target sequences. | + | In our model, we can liken the distance between the target sequences to the distance of the two dCas9 proteins, since each sequence is lodged in the core of each dCas9 protein. This allows us to calculate the distance between the two dCas9 proteins based on the distance between the target sequences '''[Fig. 4]'''. |
<hr> | <hr> | ||
+ | '''Figure 4''': Structure of the dCas9 protein''' | ||
+ | |||
+ | |||
To design the different target sequences, we converted the distance <B>d</B> between the two dCas9 proteins (in Angstrom) into the distance <B>bp</B> on the DNA (in base pair). To obtain the distance in base pairs, we need the length of a helix turn, which is 34Å, and the number of nucleic acids in a helix turn, which is 10.5 nucleic acids. This gives us the number of base pairs: | To design the different target sequences, we converted the distance <B>d</B> between the two dCas9 proteins (in Angstrom) into the distance <B>bp</B> on the DNA (in base pair). To obtain the distance in base pairs, we need the length of a helix turn, which is 34Å, and the number of nucleic acids in a helix turn, which is 10.5 nucleic acids. This gives us the number of base pairs: | ||
Line 36: | Line 44: | ||
It appeared that the calculating power of a computer would be more than welcome to estimate the optimal distance through simulation. | It appeared that the calculating power of a computer would be more than welcome to estimate the optimal distance through simulation. | ||
− | = | + | =Implemented models= |
==3D Model== | ==3D Model== | ||
Line 45: | Line 53: | ||
*A wild type GFP where we removed the 10th and 11th β sheets to mimic the tripartite GFP. | *A wild type GFP where we removed the 10th and 11th β sheets to mimic the tripartite GFP. | ||
− | The big limitation of this approach was the lack of information regarding the linker. We could not find any information concerning its 3D configuration apart from its sequence. Hence, we decided to use the PEP-FOLD software to build a 3D simulation. However, the predictions we obtained seemed very unlikely to appear naturally. Indeed, the linker is supposed to be 100% unfolded because its sequence includes a majority of glycine, whereas some of the PEP-FOLD results showed β-sheets in the linker’s structure | + | The big limitation of this approach was the lack of information regarding the linker. We could not find any information concerning its 3D configuration apart from its sequence. Hence, we decided to use the PEP-FOLD software to build a 3D simulation. However, the predictions we obtained seemed very unlikely to appear naturally. Indeed, the linker is supposed to be 100% unfolded because its sequence includes a majority of glycine, whereas some of the PEP-FOLD results showed β-sheets in the linker’s structure '''[Fig. 5]'''. |
+ | |||
+ | [[File:T--Paris Saclay--100916 linker beta sheet.png|300px|center|]] | ||
+ | <center>'''Figure 5''': A representative PEP-FOLD prediction of the linker</center> | ||
− | |||
PEP-FOLD, a software designed for the prediction of the structural configuration of folded proteins, gives predictions biased towards folded structures due to its innate optimization algorithm. As such, the configurations we obtained for our linker, conceived to be unfolded, are most likely invalid. | PEP-FOLD, a software designed for the prediction of the structural configuration of folded proteins, gives predictions biased towards folded structures due to its innate optimization algorithm. As such, the configurations we obtained for our linker, conceived to be unfolded, are most likely invalid. | ||
Line 92: | Line 102: | ||
[[File:T--Paris_Saclay--090816_Zhou.jpg|900px|center]] | [[File:T--Paris_Saclay--090816_Zhou.jpg|900px|center]] | ||
− | With the equations determining the linker’s end-to-end distance from each model, we wrote a Python program that shows the density of probability distributions of the end-to-end distance in each case. The resulting graph is displayed below: | + | With the equations determining the linker’s end-to-end distance from each model, we wrote a Python program that shows the density of probability distributions of the end-to-end distance in each case. The resulting graph is displayed on '''[Fig. 6]''' below: |
+ | |||
+ | [[File:T--Paris_Saclay--100916_fjc_wlc2.png|500px|center|]] | ||
+ | <center>'''Figure 6''': End-to-end distance density of probability according to the model used</center> | ||
− | |||
We see on this graph the distances predicted by the two models, with a mean of 18Å for the Ideal Chain model (in blue) and a mean of 27Å for the Worm-Like Chain model. The difference in the end-to-end distance between the two models is coherent because the Worm-Like Chain model is suited for stiffer polymers that are longer and less folded up than the polymers described with the Ideal Chain model. | We see on this graph the distances predicted by the two models, with a mean of 18Å for the Ideal Chain model (in blue) and a mean of 27Å for the Worm-Like Chain model. The difference in the end-to-end distance between the two models is coherent because the Worm-Like Chain model is suited for stiffer polymers that are longer and less folded up than the polymers described with the Ideal Chain model. | ||
Line 106: | Line 118: | ||
If we consider our real polymer chain, the rotation of bonds around the backbone is restricted due to hindered internal rotations and excluded-volume effects. With this consideration, we know that our results are biased. | If we consider our real polymer chain, the rotation of bonds around the backbone is restricted due to hindered internal rotations and excluded-volume effects. With this consideration, we know that our results are biased. | ||
− | The ideal and worm-like chain models gave us the order of magnitude of the expected results, but we need to adapt our model to account for steric hindrance, so we decided to develop our own model to describe the behavior of the linker. | + | The ideal and worm-like chain models gave us the order of magnitude of the expected results, but we need to adapt our model to account for steric hindrance, so we decided to develop our own model to describe the behavior of the linker. |
==Our mathematical model== | ==Our mathematical model== | ||
===Definitions=== | ===Definitions=== | ||
− | [[File:T--Paris Saclay--100916 glycine.png| | + | [[File:T--Paris Saclay--100916 glycine.png|350px|left]] |
− | + | ||
We wanted our model to provide the end-to-end distance, as well as simulate the linker in 3D.<br> | We wanted our model to provide the end-to-end distance, as well as simulate the linker in 3D.<br> | ||
− | We use this scheme: all bonds have the same length, the angle θ represents the angle between two atoms, and Φ, Ψ, ω, are the dihedral angles. <br> | + | We use this scheme: all bonds have the same length, the angle θ represents the angle between two atoms, and Φ, Ψ, ω, are the dihedral angles '''[Fig. 7]'''. <br> |
Unlike the mathematical models, in this model each segment is a bond rather than an amino acid. We thus have 36*3=108 segments. | Unlike the mathematical models, in this model each segment is a bond rather than an amino acid. We thus have 36*3=108 segments. | ||
<hr> | <hr> | ||
+ | '''Figure 7''' : Scheme used for the elaboration of our model''' | ||
+ | |||
To simulate our linker in 3D, we decided to write a program, where we represented each segment by a vector and then added these vectors in the same base. The sum of the segment vectors 0 to i is called Ui. In the end, we obtain one vector U107 that represents the end-to-end vector. We also keep the coordinates along the way so as to end up with a 3D representation of our linker. | To simulate our linker in 3D, we decided to write a program, where we represented each segment by a vector and then added these vectors in the same base. The sum of the segment vectors 0 to i is called Ui. In the end, we obtain one vector U107 that represents the end-to-end vector. We also keep the coordinates along the way so as to end up with a 3D representation of our linker. | ||
Line 131: | Line 144: | ||
[[File:T--Paris Saclay--100916 axes.png|430px|left]] | [[File:T--Paris Saclay--100916 axes.png|430px|left]] | ||
− | For each segment of our model, we can be in any one of the three following situations | + | For each segment of our model, we can be in any one of the three following situations '''[Fig. 8]''': |
*The segment is the N-Cα bond with the Φ dihedral angle | *The segment is the N-Cα bond with the Φ dihedral angle | ||
*The segment is the Cα-C bond with the Ψ dihedral angle | *The segment is the Cα-C bond with the Ψ dihedral angle | ||
Line 138: | Line 151: | ||
The basis of reference is defined as one of the orthonormal bases where U0 is written: | The basis of reference is defined as one of the orthonormal bases where U0 is written: | ||
[[File:T--Paris_Saclay--090816_init_vector.jpg|30px|center]]<br> | [[File:T--Paris_Saclay--090816_init_vector.jpg|30px|center]]<br> | ||
− | All others segments will be expressed in that vector-base. | + | '''Figure 8''': 3D Scheme we used for the elaboration <br> |
+ | <br>All others segments will be expressed in that vector-base. | ||
Then, to construct the linker, we need to define a change of basis matrix and consider the first two bonds, since for the peptide bond, ω = 0. | Then, to construct the linker, we need to define a change of basis matrix and consider the first two bonds, since for the peptide bond, ω = 0. | ||
Line 168: | Line 182: | ||
Our program is designed to give the end-to-end distance on N simulations, and show one simulation of the linker in 3D. | Our program is designed to give the end-to-end distance on N simulations, and show one simulation of the linker in 3D. | ||
− | For 5000 simulations, we obtain the following histogram: | + | For 5000 simulations, we obtain the following histogram '''[Fig. 9]''': |
+ | |||
+ | [[File:T--Paris_Saclay--090816_Gauss.jpg|500px|center]] | ||
+ | <center>'''Figure 9''' : Results of our simulating programm</center> | ||
− | |||
As we can see, the mean end-to-end distance is 19.66Å, and the standard deviation is 7.82Å. | As we can see, the mean end-to-end distance is 19.66Å, and the standard deviation is 7.82Å. | ||
− | The following graph evaluates if the linker has a preferred orientation. | + | The following graph '''[Figure 10]''' evaluates if the linker has a preferred orientation. |
− | [[File:T--Paris Saclay--100916 orientation.png| | + | [[File:T--Paris Saclay--100916 orientation.png|400px|left]] |
+ | <br> | ||
On the graph, each point represents the last segment of the linker compared to the origin. The color represents the end-to-end distance: the blue color indicates a small or minimal distance, the red color a large or maximal distance.<br> | On the graph, each point represents the last segment of the linker compared to the origin. The color represents the end-to-end distance: the blue color indicates a small or minimal distance, the red color a large or maximal distance.<br> | ||
We can visually check that the cloud of linker last segment points follows an angularly homogeneous spherical distribution, where each angle starting from the origin is represented by an equal amount of points.<br> | We can visually check that the cloud of linker last segment points follows an angularly homogeneous spherical distribution, where each angle starting from the origin is represented by an equal amount of points.<br> | ||
Line 182: | Line 199: | ||
<hr> | <hr> | ||
− | When simulating the linker in 3D, we obtain the following visualization: | + | When simulating the linker in 3D, we obtain the following visualization '''[Fig. 11]''': |
+ | |||
+ | [[File:T--Paris_Saclay--090816_linker.jpg|400px|center|]] | ||
+ | <center>'''Figure 11''' : Result of the linker simulation in 3D, with all distance in Å</center> | ||
− | |||
Compared to the polymer physics models, our model provides additional information concerning the linker, such as the spatial configurations it can take. | Compared to the polymer physics models, our model provides additional information concerning the linker, such as the spatial configurations it can take. | ||
Line 193: | Line 212: | ||
=Gromacs software= | =Gromacs software= | ||
+ | Gromacs is a molecular dynamics package designed for simulating the structure of proteins and other biological molecules. Gromacs uses a PDB file to simulate a molecule’s dynamics. In our case, we used one of the PEP-FOLD predictions to run the molecular dynamics. To do so, we chose the PEP-FOLD prediction which had the closer end-to-end distance compared to our model’s mean end-to-end distance. | ||
− | + | To run the molecular dynamics of the Gromacs software, we followed some steps: | |
− | + | 1/ we defined a box full of water which simulates the aqueous system.<br> | |
+ | 2/ we replaced few water molecules with ions to equilibrate the electrical charges of the solution and obtain electro-neutrality.<br> | ||
+ | 3/ we relaxed our system to minimal energy to ensure that it does not present any steric clashes or inappropriate geometry.<br> | ||
+ | 4/ we equilibrated the solvent (water molecules) and ions around the protein<br> | ||
+ | 5/ we finally ran the molecular dynamics of our linker for 10ns. The fact that the RMSD graph below reaches a plateau justifies limiting the time-course of the dynamics to this value. | ||
+ | |||
+ | [[File:T--Paris_Saclay--090816_end_gro.jpg|300px|left]] | ||
+ | <br><br><br><br> | ||
+ | We chose to plot the linker’s end-to-end distance so as to see the molecular dynamics obtained with Gromacs | ||
+ | '''[Fig. 12]''' . | ||
The results are: | The results are: | ||
+ | *Protein average end-to-end distance: 22.23 Å | ||
+ | *Average radius of gyration: 9.34Å | ||
+ | These results are very close to our model’s unfolded protein length predictions, which reinforces the validity of our model. | ||
+ | <hr> | ||
+ | '''Figure 12''' : Results from the Gromacs software | ||
− | |||
− | |||
− | + | We also requested the RMSD & RMSF graphs from the Gromacs simulation '''[Fig. 13]''' , so as to extract more information regarding the dynamics of our linker. | |
− | + | ||
− | + | ||
− | We also | + | |
[[File:T--Paris Saclay--090816 RMSD.jpg|600px|center]] | [[File:T--Paris Saclay--090816 RMSD.jpg|600px|center]] | ||
+ | <center>'''Figure 13''': Results from the Gromacs software for the RMSD & RMSF graphs</center> | ||
− | |||
+ | Thanks to Gromacs, we also had a movie '''[Fig. 14]''' which describe the dynamics of the linker during the 10ns: | ||
+ | <html><div style="text-align:center;"><video width="400" controls> | ||
+ | <source src="https://static.igem.org/mediawiki/2016/3/3a/T--Paris_Saclay--100916_Film_dynamique.mp4" type='video/mp4'/> | ||
+ | </video></div></html> | ||
+ | <center> '''Figure 14''': Linker dynamics video obtained with the Gromacs software </center> | ||
+ | |||
+ | |||
+ | In sum, the average end-to-end distance obtained in the Gromacs simulation is akin to that obtained in our model, which supports the assertion that our model does build the linker’s spatial configuration by taking into account steric hindrance effects just as the Gromacs software does, and more generally justifies the reasoning that led to the development of our model. | ||
+ | |||
+ | It is to be noted that, while the Gromacs simulation runs in more than 8 hours, our program runs in less than 5 minutes. | ||
+ | |||
+ | =Construction of the 3D model= | ||
+ | Now that we retrieved one of the PDB formatted files from the Gromacs simulation, we can build a realistic 3D spatial configuration for our linker inside the Pymol software '''[Fig. 15]''' : | ||
[[File:T--Paris Saclay--090816 final.jpg|600px|center]] | [[File:T--Paris Saclay--090816 final.jpg|600px|center]] | ||
+ | <center>'''Figure 15''' : Final model of our linker 3D spatial configuration</center> | ||
+ | |||
+ | |||
+ | Seeing as the Gromacs software is much more complete and reliable than our model, we chose to use the conformation of the linker which presents an end-to-end distance equal to 22Å, in accordance with the Gromacs simulation. | ||
+ | |||
+ | The Pymol software allows us to calculate the distance between the two dCas9 proteins: 244.7 Å. | ||
+ | Following the conversion equation explained above, we obtain the number of base pairs <B>bp</B> between the target sequences: | ||
+ | bp = (244.7*10.5)/34 = 76 | ||
+ | |||
+ | We can therefore conclude that the distance between the target sequences is approximately 76 base pairs. Since one helix turn correspond to 10.5 nucleic acids, we can very that there are 76/10.5 = 7.2 helix turn between the two dCas9. According to the hypotheses we made, we can conclude that the two dCas9 will be approximately in the same orientation. | ||
+ | |||
+ | =Conclusion= | ||
+ | |||
+ | In this part, we wanted to find the optimal distance between the two dCas9 to visualize the fluorescence. | ||
+ | All of the results we obtained allowed us to design the target sequences and has given us an overview of the dynamics of the system. | ||
+ | To conclude, we find a distance of 244Å between the two dCas9s which correspond to 76 base pairs. | ||
+ | |||
+ | A possible improvement for this model would be the visualization of the molecular dynamics. Indeed, our program does not calculate or show the linker’s dynamics, giving instead a static image of the linker’s spatial configuration at time t. | ||
+ | It is to be noted that our program is designed to predict and plot the spatial configuration of an unfolded protein, thus not taking into consideration pre-established α-helix or β-sheet types of folding, since the degree of unfolding of the linker of interest here is near 100%. | ||
+ | |||
+ | We have also shown that our model provides a reliable estimate of the end-to-end distance, which we have cross-checked with other models’ results, most notably the Gromacs software’s prediction. | ||
+ | |||
+ | =References= | ||
+ | |||
+ | Ting D, Wang G, Shapovalov M, Mitra R, Jordan MI, Dunbrack RL. Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model. Richardson J, éditeur. PLoS Computational Biology. 29 avr 2010;6(4):e1000763. http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000763 | ||
+ | |||
+ | https://en.wikipedia.org/wiki/Worm-like_chain | ||
+ | |||
+ | https://www.pymol.org/ | ||
+ | |||
+ | http://www.gromacs.org/ | ||
+ | |||
+ | http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/index.html | ||
+ | |||
+ | https://fr.wikipedia.org/wiki/Diagramme_de_Ramachandran | ||
− | + | http://polly.phys.msu.ru/ru/education/courses/polymer-intro/lecture2.pdf | |
− | + | http://www-f1.ijs.si/~rudi/sola/KratkyPorodmodel.pdf | |
− | + | http://www-f1.ijs.si/~rudi/sola/KratkyPorodmodel.pdf | |
− | + | Ho BK, Thomas A, Brasseur R. Revisiting the Ramachandran plot: Hard-sphere repulsion, electrostatics, and H-bonding in the α-helix. Protein Science. 1 janv 2009;12(11):2508‑22. http://bmcstructbiol.biomedcentral.com/articles/10.1186/1472-6807-5-14 | |
− | + | https://en.wikipedia.org/wiki/Ideal_chain | |
− | + | https://tel.archives-ouvertes.fr/tel-00629362/document | |
{{Team:Paris_Saclay/project_footer}} | {{Team:Paris_Saclay/project_footer}} |
Latest revision as of 17:05, 19 October 2016