MODELING

ABSTRACT
Bonding of proteins is highly depending on structural properties which in turn are determined by the amino acid sequences. Changing the amino acid sequence of one participating partner could consequently diminish it's binding ability. Therefore it is important to estimate the influence of mutations on the protein structure. This is particularly true for mutations from natural to non-natural amino acids.
To estimate the influence of O-methyl-l-tyrosine on Colicin E2's immunity protein we applied several molecular dynamics simulations leading to 1300 ns in total simulation time. To do this we estimated O-methyl-l-tyrosine parameters for the CHARMm 22 and the GROMOS36a7 force field. We evaluated our simulations by applying several well documented evaluation methods like secondary structure analysis, plotting the solvent accesible surface area, and RMSD and RMSF. Our first simualtion analysis led to the conclusion that O-methyl-l-tyrosine had no influence on the immunity protein.
To estimate possible influences on the thermodynamics of the system we calculated the binding energy between Colicin E2 and it's immunity protein by pulling experiments with following umbrella sampling molecular dynamics simulations. The binding energy was afterwards calculated using the WHAM algorithm showing only minor differences.

THEORETICAL OVERVIEW

Molecular Dynamics Simulation

Introduction

Molecular Dynamics (MD) Simulations is a method to describe atomic and molecular movements. Molecular Dynamics simulations depend on several simplifications that enables the simulation to range from nanoseconds up to severeal milliseconds in systems containg of over one hundred thousand atoms. This enables the possibility to study different biomolecular processes like protein protein binding or enzyme dynamics. Because of the deterministic nature of the system it is possible to calculate thermodynamic properties like free energy or free binding enthalpies.

Assumptions

To describe atomistic or molecular behavior the exact system conditions like positions and energies. The energies of an atomic system are described by the Schr&oumldinger equation (eq. \ref{Schrödinger}) with wavefunction $\Psi$ (eq. \ref{wavefunction}), kinetic energies $\hat{T}_{e}$ and $\hat{T}_{N}$ and potential energies $\hat{V}_{e}$, $\hat{V}_{N}$ and $\hat{V}_{eN}$. Terms with subscript $_{e}$ are terms concerning the electrons and terms with subscript $_{N}$ are terms concerning the nuclei.

$$ \begin{equation} (\hat{T}_{e} + \hat{T}_{N} + \hat{V}_{e} + \hat{V}_{N} + \hat{V}_{eN})\Psi=i \hbar \frac{\partial}{\partial t}\Psi \label{Schrödinger} \end{equation}$$ $$ \begin{equation} \Psi=\Psi(\vec{r}_{1},..., \vec{r}_{N_{e}},\vec{r}_{1},...,\vec{r}_{N_{N}}) \label{wavefunction} \end{equation}$$

Since there is no possibility to solve these equations numerical, it is necessary to simplify the system description. The first assumption depends on the Born-Oppenheimer approximation that the Schr&oumldinger equation can be splitted into two parts, one for the electrons and one for the nuclei respectivly. Since the electrons are far more mobile the dynamic of the system can be defined by the nuclei positions.
Molecular Dynamics simulations depend on several simplifications. First we assume in accord with the Born-Oppenheimer approximation that electronical movement has no influence on the overall atomic momentum because electrons will simply follow the nuclear movements in the simulated time scales. Second we can describe the potential energy function by a sum of simple terms. These terms are described in the so called force field which will be described later on. Third the system potential is evaluated by deriving the forces and applying Newtonian mechanic calculations as shown in equation \ref{newton} and \ref{newton1}.

$$ \begin{equation} M_{K}\frac{d\vec{v}_{1}}{dt} = M_{K}\frac{d^{2}\vec{r}_{1}}{dt^{2}} = \vec{F}_{\vec{r}_{1}} = \frac{\partial V\left(\vec{r}_{1},...,\vec{r}_{N}\right)}{\partial \vec{r}_{1}} \label{newton} \end{equation} $$

$$ \begin{equation} F_{ij}=-\frac{\partial}{\partial r_{ij}}V_{force~field} \label{newton1} \end{equation} $$

To solve these terms numerically we have to discretize the trajectory and therefore use an integrator for the small time steps. Several different integrators were developed today, of which the velocity-Verlet algorithm is the most used (eq. \ref{VV1} & \ref{VV2}).

$$ \begin{equation} r_{i}(t_{0} + \Delta t) = r_{i}(t_{0}) + v_{i}(t_{0})\Delta t + \frac{1}{2}a_{i}(t_{0})\Delta t^{2} \label{VV1} \end{equation} $$

$$ \begin{equation} v_{i}(t_{o}+\Delta t) = v_{i}(t_{0}) + \frac{1}{2}[a_{i}(t_{o} + \Delta t)]\Delta t \label{VV2} \end{equation} $$

The temperature of the system is directly correlated to the distribution of kinetic energies. Therefore the temperature of the system can be controlled by manipulating the atom velocities. A possible way to do this was proposed by Berendsen by coupling the system to a heat bath resulting in a NVT ensemble (eq. \ref{Berendsen}).

$$ \begin{equation} a_{i}=\frac{F_{i}}{m_{i}} + \frac{1}{2 \tau_{T}} \left( \frac{T_{B}}{T_{t}} -1 \right) v_{i} \label{Berendsen} \end{equation} $$

Empirical Force Fields

Empirical Force Fields are the backbone of every Molecular Dynamics simulation. Typically the force fields are diveded into two parts, bonded and nonbonded interactions. Bonded interactions consist of chemical bond stretching, angle bending, and rotation of dihedrals and impropers. Nonbonded interactions are approximated by Coulomb interactions (ionic) and Lennard-Jones potentials. The overall CHARMm (Chemistry at HARvard Macromolekular mechanics) potential is calculated by summing up these main potentials ( $ V_{CHARMm} = V_{bonded} + V_{nonbonded} $ ).
In equation \ref{CHARMM_bonded} and \ref{CHARMM_nonbonded} the bonded and nonbonded Potentials of the CHARMM force field are displayed. All terms consist of an equilibrium value marked with $0$ and a force constant $K$.

$$ \begin{equation} V_{bonded} = \sum_{bonds}{K_{b}(b-b_{0})^{2}} + \sum_{angels}{K_{\theta}(\theta-\theta_{0})^{2}} + \sum_{torsions}{K_{\phi}(1+cos(n\phi-\delta))} + \\ \sum_{impropers}{K_{\psi}(\psi-\psi_{0})^{2}} + \sum_{Urey-Bradley}{K_{UB}(r_{1,3}-r_{1,3,o})^{2}} + \sum_{\phi\psi}{V_{CMAP}} \label{CHARMM_bonded} \end{equation} $$ $$ \begin{equation} V_{nonbonded}=\sum_{nonbonded}{\frac{q_{i}q_{j}}{4\pi D r_{ij}}}+ \sum_{nonbonded}{\epsilon_{ij}\left[\left(\frac{R_{min,ij}}{r_{ij}}\right)^{12}-2\left(\frac{R_{min,ij}}{r_{ij}}\right)^{6}\right] } \label{CHARMM_nonbonded} \end{equation} $$

The additional terms CMAP and Urey-Bradley are correctional terms for backbone atoms and 1, 3 interactions respectively.

Simulation Analysis

Because of the vast amount of data that is produced by Molecular Dynamics simulations it is essential to process the data into more accesible formats. To perform this task we applied several approaches like comparison of the solvent accesible surface area (SASA) over time.

Root Mean Square Deviation

The Root Mean Square Deviation (RMSD) describes the sum of distances of all selected atoms $n$ between themselves in a selceted timestep $\tau$ and a reference timestep $r$ (eq. \ref{RMSD}). Plotted over time it is possible to detect fluctuations in the whole molecular configuration and therefore it is possible to conclude structures of high stability from plateaus in the RMSD.

$$ \begin{equation} RMSD_{\tau}=\sqrt{\sum_{n=1}^{N}{\left((x_{\tau,n}-x_{r,n})^{2}+(y_{\tau}-y_{r,n})^{2}+(z_{\tau,n}-z_{r,n})^{2}\right)}} \label{RMSD} \end{equation} $$

By chosing the right atom selection it is possible to evaluate different behaviour of protein subgroups. For example if the RMSD between Cα atoms is calculated it is possible to plot the backbone movement over time and hence detect configurations that differ from the starting structure. This is important if the one wants to search for different thermodynamic stable ensembles of the protein or molecul of interest.

Root Mean Square Fluctuation

Similar to the RMSD the Root Mean Square Fluctuation describes the sum of distances of all selected atoms. In this case the distance per atom between all selected configuartions is calculated and summed over time. Therefore it is possible to spot out residues with strong mobility and consequntly residues that are part of fluctuating and disordered protein subunits.

$$ \begin{equation} RMSF_{n}=\sqrt{\sum_{\tau=1}^{T}{\left((x_{n}-x_{0})^{2}+(y_{n}-y_{0})^{2}+(z_{n}-z_{0})^{2}\right)}} \label{RMSF} \end{equation} $$

DSSP

Define Secondary Structures of Proteins (DSSP) by Wolfgang Kabsch and Christian Sander is a standard program to analyse secondary structure properties of proteins. The main idea to discriminate between different secondary structures is to based on the presence of H bonds because this can be represented by one energy value. This definition enables the algorithm to distinguish different types of α helices, β sheets, and turns.
The electrostatic interations between two groups are calculated by assigning partial charges to each C ($+q_{1}$), O ($-q_{1}$), N ($-q_{2}$), and H ($+q_{2}$), with $q_{1} = 0.42~e$ and $ q_{2} = 0.20~e$ and r(AB) being the distance between two atoms A and B in Angstr&oumlm. An H bond is defined by $ E < -0.5~\frac{kcal}{mol}$.

$$ \begin{equation} E = q_{1} q_{2} \left( \frac{1}{r(ON)} + \frac{1}{r(CH)} + \frac{1}{r(OH)} + \frac{1}{r(CN)} \right) 332~\frac{kcal}{mol} \label{DSSP} \end{equation} $$

Binding Energy Calculations

Alchemical Free Energy Perturbation (FEP) is a method in computational biology to obtain energy differences from molecular dynamics or Metropolis Monte Carlo simulations between two system states. In here the system is slowly transformed from state $i$ to state $j$ through non-natural intermediates which are sampled by slowly decreasing the intermolecular interactions. The core equation (eq \ref{FEGG}) for the Helmholtz free energy difference between states $i$ and $j$ $\Delta A_{ij}$ is derived from statistical mechanics. $Q$ represents the canonical partition function, $k_B$ the Boltzman constant, $U$ the corresponding potential system energy in relation to the coordinates and momenta $\vec{q}$, $T$the temperatur and $\Gamma$ the volume of potential states of $\vec{q}$.

$$ \begin{equation} \Delta A_{ij} = -k_{B} T \frac{Q_{j}}{Q_{i}} = -k_{B} T~ln \left( \frac{\int_{\Gamma_{j}}e^{-\frac{U_{j}(\vec{q})}{k_{B}T}}d \vec{q}}{\int_{\Gamma_{i}}e^{-\frac{U_{i}(\vec{q})}{k_{B}T}}d \vec{q}}\right) \label{FEGG} \end{equation} $$

To calculate free energy differences between states with low state space overlap a thermodynamic cycle can be constructed hence the Helmholtz free energy is a thermodynamic state function. The most straightforward way to calculate state transition free energy changes would be to simulate the naturaly occuring process. For example the binding free energy between two protein can be calculated by sperarating both proteins. This approach however has high computational costs since the whole water filled box has to be simulated resulting in large systems.
The alchemical perturbation approach relies on simulating several intermidiate states over which Coulomb and Lennard-Jones interactions are slowly decreased. This is controlled via a coupling factor $\lambda$ (eq \ref{lambda}). The resulting potential energy $U$ at state $\lambda$ is subsequently calculated as sum of the two end states $U_0$, $U_1$ and all not decoupled interacions $U_{unaffected}$.

$$ \begin{equation} U(\lambda,\vec{q}) = (1-\lambda)~U_{0}(\vec{q}) + \lambda~U_{1}(\vec{q}) + U_{unaffected}(\vec{q}) \label{lambda} \end{equation} $$

The overall free energy change $\Delta A_{0,1}$ is consequently calculated as sum over all free energy changes (eq. \ref{TC}).

$$ \begin{equation} \Delta A_{0,1} = \sum^{1}_{\lambda=0} {\Delta A_{\lambda,\Delta \lambda}} \label{TC} \end{equation} $$

Gibb's free energy differences between two λ states can be calculated by equation \ref{Calc}.

$$ \begin{equation} \Delta G~(\lambda^{'} \rightarrow \lambda^{"}) = -k_{B}T~\Bigg \langle exp \left( -\frac{U(\lambda^{"})-U(\lambda^{'})}{k_{B}T} \right) \Bigg \rangle \label{Calc} \end{equation} $$

Interactions of molecules are represented by Lennard-Jones and Coulomb interactions. This can lead to problems when Lennard-Jones interactions are decoupled and charge interactions are still active. This ensemble will lead to a clashing of the molecules since the charges will attract each other without any form of antagonistic force. To avoid this problem position restraints are added which have to be considered in the end calculation.
In order to evaluate the simulations of the itermediate states and calculate the free energy difference between the end states several algorithms have been developed (e.g., Exponential Averaging (EXP) or Bennett Acceptance Ratio (BAR)). BAR computes the energy difference between two simulations generating the trajectories $ n_i $ and $ n_j $ with the corresponding potential energy functions $U_i$ and $U_j$. The free energy difference $\Delta A_{ij}$ can then be written as in equation \ref{Bennett} where $f()$ stands for the Fermi function (eq. \ref{Fermi}).

$$ \begin{equation} \Delta A_{ij} = k_B T ~ \left( ln \frac{\sum_j f(U_i - U_j + k_B T~ln \frac{Q_i n_j}{Q_j n_i})}{\sum_i f(U_j - U_i - k_B T~ln \frac{Q_i n_j}{Q_j n_i})} - ln \frac{n_j}{n_i} \right)+ k_B T~ln \frac{Q_i n_j}{Q_j n_i} \label{Bennett} \end{equation} $$

$$ \begin{equation} f(x) = \frac{1}{1 + e^{~\beta x}} \label{Fermi} \end{equation} $$

The Term $k_B T~ln \frac{Q_i n_j}{Q_j n_i}$ has to be approximated since it can not be computed analytically. Equation \ref{Bennett2} describes the relation used by Bennett et al. to estimate the term. Once it has been determined the free energy difference can be calculated by equation \ref{Bennett3}.

$$ \begin{equation} \sum_j f\left(U_i - U_j + k_B T~ln \frac{Q_i n_j}{Q_j n_i}\right) = \sum_i f\left(U_j - U_i - k_B T~ln \frac{Q_i n_j}{Q_j n_i}\right) \label{Bennett2} \end{equation} $$

$$ \begin{equation} \Delta A_{ij} = - k_B T ~ ln \frac{n_j}{n_i} + k_B T~ln \frac{Q_i n_j}{Q_j n_i} \label{Bennett3} \end{equation} $$

Later on Shirts et al. derived the BAR method by applying maximum likelihood techniques.
Since common analysis methods in biochemistry often rely on titration only dissociation ($K_{d}$) respectively association constants ($K_{a}$) can be concluded from experiments. Therefore the dissociation constants were calculated by using equation \ref{gibbs}.

$$ \begin{equation} K_{a/d} = e^{-\frac{\Delta G}{RT}} \label{gibbs} \end{equation} $$

METHODS

Visualisations

Colicin E2

No obtainable 3D model of Colicin E2 was found in the Research Collaboration for Structural Bioinformatics (RCSB) Protein Data Bank (PDB) at the Brookhaven National Laboratory and the Protein Data Bank in Europe (PDBe) at the European Bioinformatics Institue (EMBL-EBI). Kristallographic structures of the DNase subunit of Colicin E2 and it's bacterial import subunit was available. Therefore we chose to use homology modelling to obtain a 3D structure of Colicin E2. For homology modelling the Protein Homology/analogY Recognition Engine V 2.0 (PHYRE²) [] server was used in combination with the amino acid sequence obtained from the Universal Protein Resource (UniProt) Protein Knowledgebase (UniProtKB) entrance P04419 [].
The obtained model was based on the known subunits of Colicin E2 and Colicin E3, a close realtive. A CHARMm topology was then produced via the pdb2gmx module of GROMACs 5.1.3, the model was solvated in TIP3P water and energy minimized using the steepest descend algorithm to !FEHLTNOCH! kJ/mol.

Force Field Parametrization

A 3D model of O-methyl-l-tyrosine was created using Avogadro 1.1.1 and energy minimized using the steepest descend algorithm in vacuum. The so obtained model was subsequently parsed to the SwissParam [] topology server. The created topology was then used to derive parameters for the CHARMm residue force field so that for any protein with O-methyl-l-tyrosine a topology could be created. Since O-methyl-l-tyrosine is very similar to tyrosine most parameters could be adopted.
CHARMm is an atom type based force field, meaning that every parameter is not dependend on the residue but on the assigned atom types taking part in the interactions. As a result we had to define several new atom types to account for the changed properties of the methyl ether since no other benzyl ether atoms could be found in the force field parameter files.

To implement the newly derived parameters we had to update several force field parameter files that we gathered from the GROMACS 5.0.4 software suite. This was due to the case that all parameter files had to be compatible to the GROMACS software suite. First we had to implement the new amino acid in the aminoacids.rtp file which basically serves as a register for all known residues. The amino acid entry contains a three letter code by which it will be identified, all atoms with name, type and charge, all bonds between these atoms as well as impropers and CMAPs. Since the last two entries only concern the backbone atoms, we simply copied those entries. Atom charges were duplicated from tyrosine for all atoms that were not involved in the ether bond. The ether methyl group was represented by a standard methyl group with CT3 carbon and HA hydrogen. This atom classes represent alanin's Cβ and the Cδ atom of methionin.
A new atom class named OE was introduced to represent the ether oxygen since there are no ether oxygen parameters for amino acid residues in the CHARMm force field files. We used parameters from the generated topology file to derive parameters for all interactions from the OE atom type. To implement these parameters we updated the bondtypes, angletypes, and dihedrals in the ffbonded.itp file. Similar we updated the ffnonbonded.itp file sections atomtypes, and pairtypes. Subsequently we add an OE entry to the atomtypes.atp file.
CHARMm simulates aromaticity through dummy atoms which will be calculated automatically. To achiev this for our new amino acid we updated the aminoacids.vsd file and inserted every bond length and angle. Finally we had to assign a three letter code to represent our amino acid in the topology and structure files. We chose OMT. This resulted in the empirical force field CHARMm 27 OMT.

Molecular Dynamics Simulations

3D structures of Colicin E2 DNase subunit (SAse) and it's immunity protein were obtained from RCSB PDB entrance 3U43 [] . Since the main binding occurs between the DNase subunit and the immunity protein only this Colicin E2 subunit was simulated. Furthermore we established a minimized Colicin E2 in another project part and wanted to test it's operational capability.
To insert mutations inside the immunity protein we inserted the energy minimized structure of OMT at the desired position. Positions of backbone atoms were fitted to those of the replaced amino acid so that the backbone integrity was preserved. This was achieved through the implementation of Kabsch's algorithm for structural alignments in the Bio3D package [] for the statistical computing language R []. Additionaly we used Bio3D's pdb processing package to seperate Colicin E2 DNase subunit and its immunity protein.
All molecular dynamics simualtions were performed with the GROningen MAchine for Chemical Simulations (GROMACS) 5.0.3 [] software suite. As empirical force field CHARMm 27 OMT was used. An explicit water model with TIP3P water was chosen and a cubic box with edge length !FEHLTNOCH! nm was constructed. The box was subsequently filled with water and the system was neutralized through insertions of chloride ions. After neutralization the system was energy minimized with the steepest descent algorithm until it converted. The exact values can be found in table 1. To generate velocity and temperate the system a small equilibration run of about 500 ps was performed. The end temperature was set to 298 K and the Berendsen thermostat and the velocity-Verlet integrator with a stepsize of 2 fs were used. After this equilibration run a NVT ensemble was achieved. To achieve a NPT ensemble the equilibration run was repeated with applied pressure coupling to 1 bar. Subsequently the final MD production run was performed. Every 5000st step was saved resulting in a trajectory of 10001 conformations ranging over 100 ns simualtion time.

Molecular Dynamics Simulations Analysis

Simulation Analysis was performed using the R [] package Bio3D []. All plots were created using the R package ggplot2 []. For visualization of protein structures and trajectories the PyMOL visualization system [] was used. To compensate for eventual jumps due to translations across the PBC barriers a GROMACS internal fitting program was applied (gmx trjconv -center -pbc nojump). To exclude translational and rotational movement of the simulated protein we applied the GROMACS internal fitting algorithms gmx trjconv -fit rot+trans. These trajectory manipulations were performed because several analysis methods like RMSD and RMSF rely on distance calculations between atom positions over time. In these analyses the translational and rotational movements are not of interest since we only want to visualize the movement of atoms in relation to the simulated protein to test for different configuarations and structural flexibility. Additionally the crossing of PBC barriers would increase the distance between two atom positions drastically since the atom would be relocated to the opposing site of the simualted system.

Binding Energy Calculations

Binding energies were calculated by using alchemical Free Energy Perturbation (FEP) in combination with Bennett Acceptence Ratio (BAR). A thermodynamical cycle was constructed (see fig. 2) and accordingly two simulation sets were performed. First Colicin E2's immunity protein was simulated in TIP3P water with 0.6 nm space between the box and the protein. Furthermore Lennard-Jones potential and Coulomb interactions were decoupled over ten simulations each resulting in 20 simulations in total. The simulations were energy minimized over approximately 300 steps, NVT equilibrated over 10 ps and NPT equilibrated over 100 ps. The production run was performed for 2 ns. Second the binding complex consisting of Colicin E2 and its immunity protein was simulated under similar conditions i.e. equilibration and simulation times and steps were chosen alike. In contrast to the immunity protein simulations additional restraint decoupling steps were performed over ten simulations before the other steps. All simulations were evaluated using the BAR scripts from the pyMBAR python library.

RESULTS & CONCLUSION

Molecular dynamcis simulations

Three amino acid positions were chosen for O-methyl-l-tyrosine (OMT) exchange evaluation (tyrosine 8 (T8), phenylalanine 13 (F13) and phenylalanine 16 (F16)). These positions were selected because of their small deviation in regard to OMT and were therefore expected to cause the smallest structural difference. All molecular dynamics steps were performed on these mutation variants as well as on the wildtype protein for comparison. All Simulations were performed over 100 ns leading to 10001 conformations each. These conformations were evaluated using RMSD, RMSF, SASA and the amount of secondary structures over time.

References

[1]
[2]
[3]

Theoretical Overview
Methods
Results

Team:TU Darmstadt/Model

MODELING

THEORETICAL OVERVIEW

Molecular Dynamics Simulation

Introduction

Assumptions

Empirical Force Fields

Simulation Analysis

Root Mean Square Deviation

Root Mean Square Fluctuation

DSSP

Binding Energy Calculations

METHODS

Visualisations

Colicin E2

Force Field Parametrization

Molecular Dynamics Simulations

Molecular Dynamics Simulations Analysis

Binding Energy Calculations

RESULTS & CONCLUSION

Molecular dynamcis simulations

References