Line 270: | Line 270: | ||
<p>Three amino acid positions were chosen for <i>O</i>-methyl-<span style="font-variant: small-caps;">l</span>-tyrosine (OMT) exchange evaluation (tyrosine 8 (Y8), phenylalanine 13 (F13) and phenylalanine 16 (F16)). These positions were selected because of their small deviation in regard to OMT and were therefore expected to cause the smallest structural difference. All molecular dynamics steps were performed on these mutation variants as well as on the wildtype protein for comparison. All simulations were performed over 100 ns, leading to 10001 conformations each. These conformations were evaluated using RMSD, RMSF, SASA and the amount of secondary structures over time.<br/>Figure 1 displayes the RMSD in regard to the initial conformation. It can be observed that the mutational variant Y8O exhibits similar curve characteristics as the wildtype variant. The mutational variant F16O on the contrary shows a more severe deviation towards the wildtype.</p> | <p>Three amino acid positions were chosen for <i>O</i>-methyl-<span style="font-variant: small-caps;">l</span>-tyrosine (OMT) exchange evaluation (tyrosine 8 (Y8), phenylalanine 13 (F13) and phenylalanine 16 (F16)). These positions were selected because of their small deviation in regard to OMT and were therefore expected to cause the smallest structural difference. All molecular dynamics steps were performed on these mutation variants as well as on the wildtype protein for comparison. All simulations were performed over 100 ns, leading to 10001 conformations each. These conformations were evaluated using RMSD, RMSF, SASA and the amount of secondary structures over time.<br/>Figure 1 displayes the RMSD in regard to the initial conformation. It can be observed that the mutational variant Y8O exhibits similar curve characteristics as the wildtype variant. The mutational variant F16O on the contrary shows a more severe deviation towards the wildtype.</p> | ||
<p> | <p> | ||
− | <center><div class="bild" style="width:70%;"> | + | <center><div class="bild" style="width:70%;"><img src="https://static.igem.org/mediawiki/2016/8/81/T--TU_Darmstadt--RMSDc.png" width=100%><b>Figure 1:</b> RMSD of mutation variants (Y8O, F16O) and the wildtype (WT).</div></center> |
</p> | </p> | ||
<p> | <p> | ||
The solvent accessible surface area (SASA) was calculated for every simualtion step using DSSP and is displayed in figure 2. The little to no fluctuation in the SASA of all simulated variants is an argument for the high structual stability of the wildtype and the mutational variants. The disparity between the mutational variants and the wildtype can be traced back to DSSP algorithm since it is based solely on natural amino acids. Therefore it can not evaluate a non natural amino acid and would cause a discrepancy between surface areas if non-natural amino acids like OMT are involved. | The solvent accessible surface area (SASA) was calculated for every simualtion step using DSSP and is displayed in figure 2. The little to no fluctuation in the SASA of all simulated variants is an argument for the high structual stability of the wildtype and the mutational variants. The disparity between the mutational variants and the wildtype can be traced back to DSSP algorithm since it is based solely on natural amino acids. Therefore it can not evaluate a non natural amino acid and would cause a discrepancy between surface areas if non-natural amino acids like OMT are involved. | ||
</p> | </p> | ||
− | <center><div class="bild" style="width:70%;"> | + | <center><div class="bild" style="width:70%;"><img src="https://static.igem.org/mediawiki/2016/1/16/T--TU_Darmstadt--SASAc.png" width=100%><b>Figure 2:</b> SASA of mutation variants (Y8O, F16O) and the wildtype (WT).</div></center> |
</p> | </p> | ||
<p> | <p> |
Revision as of 19:42, 17 October 2016
MODELING
ABSTRACT
Bonding of proteins is highly depending on structural properties which in turn are determined by the amino acid sequences. Changing the amino acid sequence of one participating partner could consequently diminish its binding ability. Therefore it is important to estimate the influence of mutations on the protein structure. This is particularly true for mutations from natural to non-natural amino acids.
To estimate the influence of O-methyl-l-tyrosine on Colicin E2's immunity protein we applied several molecular dynamics simulations leading to 1300 ns in total simulation time. To do this we estimated O-methyl-l-tyrosine parameters for the CHARMm 22 and the GROMOS36a7 force field. We evaluated our simulations by applying several well documented evaluation methods, like secondary structure analysis, plotting the solvent accesible surface area, and RMSD and RMSF. Our first simulation analysis led to the conclusion that O-methyl-l-tyrosine had no influence on the immunity protein.
To estimate possible influences on the thermodynamics of the system we calculated the binding energy between Colicin E2 and its immunity protein by pulling experiments with following umbrella sampling molecular dynamics simulations. The binding energy was afterwards calculated using the WHAM algorithm showing only minor differences.
THEORETICAL OVERVIEW
Molecular Dynamics Simulation
Introduction
Molecular Dynamics (MD) Simulations is a method to describe atomic and molecular movements. Molecular Dynamics simulations depend on several simplifications that enables the simulation to range from nanoseconds up to severeal milliseconds in systems containg of over one hundred thousand atoms. This enables the possibility to study different biomolecular processes like protein protein binding or enzyme dynamics. Because of the deterministic nature of the system it is possible to calculate thermodynamic properties like free energy or free binding enthalpies.
Assumptions
To describe atomistic or molecular behavior the exact system conditions like positions and energies. The energies of an atomic system are described by the Schrödinger equation (eq. \ref{Schrödinger}) with wavefunction \(\Psi\) (eq. \ref{wavefunction}), kinetic energies \(\hat{T}_{e}\) and \(\hat{T}_{N}\) and potential energies \(\hat{V}_{e}\), \(\hat{V}_{N}\) and \(\hat{V}_{eN}\). Terms with subscript \(_{e}\) are terms concerning the electrons and terms with subscript \(_{N}\) are terms concerning the nuclei.
Since there is no possibility to solve these equations numerically, it is necessary to simplify the system description. The first assumption depends on the Born-Oppenheimer approximation that the Schrödinger equation can be split into two parts, one for the electrons and one for the nuclei respectivly. Since the electrons are far more mobile the dynamic of the system can be defined by the nuclei positions.
Molecular Dynamics simulations depend on several simplifications. First, we assume in accord with the Born-Oppenheimer approximation that electronical movement has no influence on the overall atomic momentum since electrons will simply follow the nuclear movements in the simulated time scales. Second, we can describe the potential energy function by a sum of simple terms. These terms are described in the so-called force field which will be described later on. Third, the system potential is evaluated by deriving the forces and applying Newtonian mechanic calculations as shown in equation \ref{newton} and \ref{newton1}.
To solve these terms numerically we have to discretize the trajectory and therefore use an integrator for the small time steps. Several different integrators are developed today, of which the velocity-Verlet algorithm is the most used (eq. \ref{VV1} & \ref{VV2}).
The temperature of the system is directly correlated to the distribution of kinetic energies. Therefore the temperature of the system can be controlled by manipulating the atom velocities. A possible way to do this was proposed by Berendsen by coupling the system to a heat bath resulting in a NVT ensemble (eq. \ref{Berendsen}).
Empirical Force Fields
Empirical Force Fields are the backbone of every Molecular Dynamics simulation. Typically the force fields are divided into two parts, bonded and nonbonded interactions. Bonded interactions consist of chemical bond stretching, angle bending, and rotation of dihedrals and impropers. Nonbonded interactions are approximated by Coulomb interactions (ionic) and Lennard-Jones potentials. The overall CHARMm (Chemistry at HARvard Macromolekular mechanics) potential is calculated by summing up these main potentials ( \( V_{CHARMm} = V_{bonded} + V_{nonbonded} \) ).
In equation \ref{CHARMM_bonded} and \ref{CHARMM_nonbonded} the bonded and nonbonded potentials of the CHARMM force field are displayed. All terms consist of an equilibrium value marked with \(0\) and a force constant \(K\).
The additional terms CMAP and Urey-Bradley are correctional terms for backbone atoms and 1, 3 interactions respectively.
Simulation Analysis
Since the vast amount of data that is produced by Molecular Dynamics simulations it is essential to process the data into more accesible formats. To perform this task we applied several approaches like comparison of the solvent accesible surface area (SASA) over time.
Root Mean Square Deviation
The Root Mean Square Deviation (RMSD) describes the sum of distances of all selected atoms \(n\) between themselves in a selceted timestep \(\tau\) and a reference timestep \(r\) (eq. \ref{RMSD}). Plotted over time it is possible to detect fluctuations in the whole molecular configuration and therefore it is possible to conclude structures of high stability from plateaus in the RMSD.
By choosing the right atom selection it is possible to evaluate the different behaviours of protein subgroups. For example, if the RMSD between Cα atoms is calculated it is possible to plot the backbone movement over time and hence detect configurations that differ from the starting structure. This is important if one wants to search for different thermodynamically stable ensembles of the protein or molecule of interest.
Root Mean Square Fluctuation
Similar to the RMSD the Root Mean Square Fluctuation describes the sum of distances of all selected atoms. In this case the distance per atom between all selected configuartions is calculated and summed over time. Therefore it is possible to spot residues with strong mobility and consequntly residues that are part of fluctuating and disordered protein subunits.
DSSP
Define Secondary Structures of Proteins (DSSP) by Wolfgang Kabsch and Christian Sander is a standard program to analyse secondary structure properties of proteins. The main idea to discriminate between different secondary structures is based on the presence of H bonds because this can be represented by one energy value. This definition enables the algorithm to distinguish different types of α helices, β sheets, and turns.
The electrostatic interations between two groups are calculated by assigning partial charges to each C (\(+q_{1}\)), O (\(-q_{1}\)), N (\(-q_{2}\)), and H (\(+q_{2}\)), with \(q_{1} = 0.42~e\) and \( q_{2} = 0.20~e\) and r(AB) being the distance between two atoms A and B in Angström. An H bond is defined by \( E < -0.5~\frac{kcal}{mol}\).
$$ \begin{equation} E = q_{1} q_{2} \left( \frac{1}{r(ON)} + \frac{1}{r(CH)} + \frac{1}{r(OH)} + \frac{1}{r(CN)} \right) 332~\frac{kcal}{mol} \label{DSSP} \end{equation} $$
Binding Energy Calculations
Alchemical Free Energy Perturbation (FEP) is a method in computational biology to obtain energy differences from molecular dynamics or Metropolis Monte Carlo simulations between two system states. In here the system is slowly transformed from state \(i\) to state \(j\) through non-natural intermediates which are sampled by slowly decreasing the intermolecular interactions. The core equation (eq \ref{FEGG}) for the Helmholtz free energy difference between states \(i\) and \(j\) \(\Delta A_{ij}\) is derived from statistical mechanics. \(Q\) represents the canonical partition function, \(k_B\) the Boltzman constant, \(U\) the corresponding potential system energy in relation to the coordinates and momenta \(\vec{q}\), \(T\)the temperatur and \(\Gamma\) the volume of potential states of \(\vec{q}\).
$$ \begin{equation} \Delta A_{ij} = -k_{B} T \frac{Q_{j}}{Q_{i}} = -k_{B} T~ln \left( \frac{\int_{\Gamma_{j}}e^{-\frac{U_{j}(\vec{q})}{k_{B}T}}d \vec{q}}{\int_{\Gamma_{i}}e^{-\frac{U_{i}(\vec{q})}{k_{B}T}}d \vec{q}}\right) \label{FEGG} \end{equation} $$
To calculate free energy differences between states with low state space overlap a thermodynamic cycle can be constructed hence the Helmholtz free energy is a thermodynamic state function. The most straightforward way to calculate state transition free energy changes would be to simulate the naturally occuring process. For example the binding free energy between two proteins can be calculated by separating both proteins. This approach however has high computational costs since simulating the whole water filled box results in large systems.
The alchemical perturbation approach relies on simulating several intermediate states over which Coulomb and Lennard-Jones interactions are slowly decreased. This is controlled via a coupling factor \(\lambda\) (eq \ref{lambda}). The resulting potential energy \(U\) at state \(\lambda\) is subsequently calculated as a sum of the two end states \(U_0\), \(U_1\) and all not decoupled interacions \(U_{unaffected}\).
$$ \begin{equation} U(\lambda,\vec{q}) = (1-\lambda)~U_{0}(\vec{q}) + \lambda~U_{1}(\vec{q}) + U_{unaffected}(\vec{q}) \label{lambda} \end{equation} $$
The overall free energy change \(\Delta A_{0,1}\) is consequently calculated as the sum over all free energy changes (eq. \ref{TC}).$$ \begin{equation} \Delta A_{0,1} = \sum^{1}_{\lambda=0} {\Delta A_{\lambda,\Delta \lambda}} \label{TC} \end{equation} $$
Gibb's free energy differences between two λ states can be calculated by equation \ref{Calc}.
$$ \begin{equation} \Delta G~(\lambda^{'} \rightarrow \lambda^{"}) = -k_{B}T~\Bigg \langle exp \left( -\frac{U(\lambda^{"})-U(\lambda^{'})}{k_{B}T} \right) \Bigg \rangle \label{Calc} \end{equation} $$
Interactions of molecules are represented by Lennard-Jones and Coulomb interactions. This can lead to problems when Lennard-Jones interactions are decoupled and charge interactions are still active. This ensemble will lead to a clashing of the molecules since the charges will attract each other without any form of antagonistic force. To avoid this problem, position restraints are added which have to be considered in the final calculation.
In order to evaluate the simulations of the intermediate states and calculate the free energy difference between the end states several algorithms have been developed (e.g., Exponential Averaging (EXP) or Bennett Acceptance Ratio (BAR)). BAR computes the energy difference between two simulations generating the trajectories \( n_i \) and \( n_j \) with the corresponding potential energy functions \(U_i\) and \(U_j\). The free energy difference \(\Delta A_{ij}\) can then be written as in equation \ref{Bennett} where \(f()\) stands for the Fermi function (eq. \ref{Fermi}).
$$ \begin{equation} \Delta A_{ij} = k_B T ~ \left( ln \frac{\sum_j f(U_i - U_j + k_B T~ln \frac{Q_i n_j}{Q_j n_i})}{\sum_i f(U_j - U_i - k_B T~ln \frac{Q_i n_j}{Q_j n_i})} - ln \frac{n_j}{n_i} \right)+ k_B T~ln \frac{Q_i n_j}{Q_j n_i} \label{Bennett} \end{equation} $$
$$ \begin{equation} f(x) = \frac{1}{1 + e^{~\beta x}} \label{Fermi} \end{equation} $$
The Term \(k_B T~ln \frac{Q_i n_j}{Q_j n_i}\) has to be approximated since it cannot be computed analytically. Equation \ref{Bennett2} describes the relation used by Bennett et al. to estimate the term. Once it has been determined the free energy difference can be calculated by equation \ref{Bennett3}.
$$ \begin{equation} \sum_j f\left(U_i - U_j + k_B T~ln \frac{Q_i n_j}{Q_j n_i}\right) = \sum_i f\left(U_j - U_i - k_B T~ln \frac{Q_i n_j}{Q_j n_i}\right) \label{Bennett2} \end{equation} $$
$$ \begin{equation} \Delta A_{ij} = - k_B T ~ ln \frac{n_j}{n_i} + k_B T~ln \frac{Q_i n_j}{Q_j n_i} \label{Bennett3} \end{equation} $$
Later on Shirts et al. derived the BAR method by applying maximum likelihood techniques.
Since common analysis methods in biochemistry often rely on titration, only dissociation (\(K_{d}\)) respectively association constants (\(K_{a}\)) can be concluded from experiments. Therefore the dissociation constants were calculated by using equation \ref{gibbs}.
$$ \begin{equation} K_{a/d} = e^{-\frac{\Delta G}{RT}} \label{gibbs} \end{equation} $$
METHODS
Visualisations
Colicin E2
No obtainable 3D model of Colicin E2 was found in the Research Collaboration for Structural Bioinformatics (RCSB) Protein Data Bank (PDB) at the Brookhaven National Laboratory and the Protein Data Bank in Europe (PDBe) at the European Bioinformatics Institue (EMBL-EBI). Crystallographic structures of the DNase subunit of Colicin E2 and its bacterial import subunit was available. Therefore we chose to use homology modelling to obtain a 3D structure of Colicin E2. For homology modelling the Protein Homology/analogY Recognition Engine V 2.0 (PHYRE2) [] server was used in combination with the amino acid sequence obtained from the Universal Protein Resource (UniProt) Protein Knowledgebase (UniProtKB) entrance P04419 [].
The obtained model was based on the known subunits of Colicin E2 and Colicin E3, a close relative. A CHARMm topology was then produced via the pdb2gmx module of GROMACs 5.1.3, the model was solvated in TIP3P water and energy minimized using the steepest descend algorithm to !FEHLTNOCH! kJ/mol.
Force Field Parametrization
A 3D model of O-methyl-l-tyrosine was created using Avogadro 1.1.1, and energy minimized using the steepest descend algorithm in vacuum. The so obtained model was subsequently parsed to the SwissParam [] topology server. The created topology was then used to derive parameters for the CHARMm residue force field so that for any protein with O-methyl-l-tyrosine a topology could be created. Since O-methyl-l-tyrosine is very similar to tyrosine most parameters could be adopted.
CHARMm is an atom type based force field, meaning that every parameter is not dependend on the residue but on the assigned atom types taking part in the interactions. As a result we had to define several new atom types to account for the changed properties of the methyl ether since no other benzyl ether atoms could be found in the force field parameter files.
To implement the newly derived parameters we had to update several force field parameter files that we gathered from the GROMACS 5.0.4 software suite. This was due to the case that all parameter files had to be compatible to the GROMACS software suite. First we had to implement the new amino acid in the aminoacids.rtp file which basically serves as a register for all known residues. The amino acid entry contains a three letter code by which it will be identified, all atoms with name, type and charge, all bonds between these atoms as well as impropers and CMAPs. Since the last two entries only concern the backbone atoms, we simply copied those entries. Atom charges were duplicated from tyrosine for all atoms that were not involved in the ether bond. The ether methyl group was represented by a standard methyl group with CT3 carbon and HA hydrogen. This atom classes represent alanin's Cβ and the Cδ atom of methionin.
A new atom class named OE was introduced to represent the ether oxygen since there are no ether oxygen parameters for amino acid residues in the CHARMm force field files. We used parameters from the generated topology file to derive parameters for all interactions from the OE atom type. To implement these parameters we updated the bondtypes, angletypes, and dihedrals in the ffbonded.itp file. Similarly, we updated the ffnonbonded.itp file sections atomtypes, and pairtypes. Subsequently we added an OE entry to the atomtypes.atp file.
CHARMm simulates aromaticity through dummy atoms which will be calculated automatically. To achieve this for our new amino acid we updated the aminoacids.vsd file and inserted every bond length and angle. Finally we had to assign a three letter code to represent our amino acid in the topology and structure files. We chose OMT. This resulted in the empirical force field CHARMm 27 OMT.
Molecular Dynamics Simulations
3D structures of the Colicin E2 DNase subunit (SAse) and its immunity protein were obtained from the RCSB PDB entrance 3U43 [] . Since the main binding occurs between the DNase subunit and the immunity protein only this Colicin E2 subunit was simulated. Furthermore we established a minimized Colicin E2 in another project part and wanted to test its operational capability.
To insert mutations inside the immunity protein we inserted the energy minimized structure of OMT at the desired position. Positions of backbone atoms were fitted to those of the replaced amino acid so that the backbone integrity was preserved. This was achieved through the implementation of Kabsch's algorithm for structural alignments in the Bio3D package [] for the statistical computing language R []. Additionaly we used Bio3D's pdb processing package to separate Colicin E2 DNase subunit and its immunity protein.
All molecular dynamics simulations were performed with the GROningen MAchine for Chemical Simulations (GROMACS) 5.0.3 [] software suite. As empirical force field CHARMm 27 OMT was used. An explicit water model with TIP3P water was chosen and a cubic box with edge length !FEHLTNOCH! nm was constructed. The box was subsequently filled with water and the system was neutralized through insertions of chloride ions. After neutralization the system was energy minimized with the steepest descent algorithm until it converted. The exact values can be found in table 1. To generate velocity and temperate the system a small equilibration run of about 500 ps was performed. The end temperature was set to 298 K and the Berendsen thermostat and the velocity-Verlet integrator with a stepsize of 2 fs were used. After this equilibration run a NVT ensemble was achieved. To achieve a NPT ensemble the equilibration run was repeated with applied pressure coupling to 1 bar. Subsequently the final MD production run was performed. Every 5000st step was saved resulting in a trajectory of 10001 conformations ranging over 100 ns simulation time.
Molecular Dynamics Simulations Analysis
Simulation Analysis was performed using the R [] package Bio3D []. All plots were created using the R package ggplot2 []. For visualization of protein structures and trajectories the PyMOL visualization system [] was used. To compensate for eventual jumps due to translations across the PBC barriers a GROMACS internal fitting program was applied (gmx trjconv -center -pbc nojump). To exclude translational and rotational movement of the simulated protein we applied the GROMACS internal fitting algorithms gmx trjconv -fit rot+trans. These trajectory manipulations were performed because several analysis methods like RMSD and RMSF rely on distance calculations between atom positions over time. In these analyses the translational and rotational movements are not of interest since we only want to visualize the movement of atoms in relation to the simulated protein to test for different configuarations and structural flexibility. Additionally the crossing of PBC barriers would increase the distance between two atom positions drastically since the atom would be relocated to the opposing site of the simulated system.
Binding Energy Calculations
Binding energies were calculated by using alchemical Free Energy Perturbation (FEP) in combination with Bennett Acceptence Ratio (BAR). A thermodynamical cycle was constructed (see fig. 2) and accordingly two simulation sets were performed. First, Colicin E2's immunity protein was simulated in TIP3P water with 0.6 nm space between the box and the protein. Furthermore, Lennard-Jones potential and Coulomb interactions were decoupled over ten simulations, each resulting in 20 simulations in total. The simulations were energy minimized over approximately 300 steps, NVT equilibrated over 10 ps and NPT equilibrated over 100 ps. The production run was performed for 2 ns. Second, the binding complex consisting of Colicin E2 and its immunity protein was simulated under similar conditions i.e. equilibration and simulation times, and steps were chosen alike. In contrast to the immunity protein simulations additional restraint decoupling steps were performed over ten simulations before the other steps. All simulations were evaluated using the BAR scripts from the pyMBAR python library.
RESULTS & CONCLUSION
Molecular Dynamcis Simulations
Three amino acid positions were chosen for O-methyl-l-tyrosine (OMT) exchange evaluation (tyrosine 8 (Y8), phenylalanine 13 (F13) and phenylalanine 16 (F16)). These positions were selected because of their small deviation in regard to OMT and were therefore expected to cause the smallest structural difference. All molecular dynamics steps were performed on these mutation variants as well as on the wildtype protein for comparison. All simulations were performed over 100 ns, leading to 10001 conformations each. These conformations were evaluated using RMSD, RMSF, SASA and the amount of secondary structures over time.
Figure 1 displayes the RMSD in regard to the initial conformation. It can be observed that the mutational variant Y8O exhibits similar curve characteristics as the wildtype variant. The mutational variant F16O on the contrary shows a more severe deviation towards the wildtype.
The solvent accessible surface area (SASA) was calculated for every simualtion step using DSSP and is displayed in figure 2. The little to no fluctuation in the SASA of all simulated variants is an argument for the high structual stability of the wildtype and the mutational variants. The disparity between the mutational variants and the wildtype can be traced back to DSSP algorithm since it is based solely on natural amino acids. Therefore it can not evaluate a non natural amino acid and would cause a discrepancy between surface areas if non-natural amino acids like OMT are involved.
Additionally we evaluated the RMSF and the amount of secondary structures over time (data not shown) which exhibited no differences between the mutational variants and the wildtype. Therefore the results of these evaluation methods are not shown on this wiki.
Closing up we can conclude that the mutational variant Y8O does fit our demands best since its behavior exhibits the least disparity in our applied evaluation methods towards the wildtype. Therefore Y8O was chosen for further analyses.
Furthermore we evaluated the stability of our designed MiniColicin by performing 100 ns of MD simulations with different force fields (CHARMm27, AMBER03, GROMOS56a7). The RMSD of these simulations is displayed in figure 3. It can be observed that some deviations between the different force fields occur. The simulation run with the GROMOS56a7 shows the biggest discrepancy towards the AMBER and CHARMm simulations which display little to no fluctuations over time. This behavior can be traced to the different parametrization approach in the force fields as well as to the fact that the GROMOS56a7 force field is a united atom force field in which all CH3 and CH2 groups are describbed as one group.
References
- [1]
- [2]
- [3]