We modelled several aspects of the CryptoGErM system. These
models are described and explained in detail in the following
articles:
The AI model explores
the behaviour of the people who use, or try to abuse, the
CryptoGErM system. The people are modelled in a multi-agent
simulation using genetic algorithms.
Mutation Rates
contains research of the literature on known mutation rates of
Bacillus under the special conditions and treatments that will
be applied.
In the Decoding
Fidelity section we explore how accurate the current
sequencing techniques are and how strong our decoy encryption
security layer is.
Random
Mutation uses computational modelling to calculate the
probability of random mutations causing our message in the DNA
to be corrupted.
The last model searches for an optimal strain for our
photoswitchable antibiotic experiments.
Modelling Rational Agents with a Genetic Algorithm
In the following sections we propose a computational simulation
that aims to model the behaviour of some of the agents that could
make use of the CryptoGErM system. The model combines knowledge of
different research areas such as Epistemic Logic, Machine Learning
and Multi-Agent Systems. Two agents have been modelled, a possible
Sender and a relative Criminal. The goal of the first one is to
encrypt and send a top-secret message as safely as possible, while
the criminal’s goal is to avoid this possible scenario. The agents
have been modelled according to the BDI Architecture, a formal
model that defines how rational agents act and behave [1], which
has been modified and expanded specifically for this simulation.
The individual parameters of this architecture have been estimated
through the use of a genetic algorithm in order to understand which
parts of the architecture turn out to be more significant in a
possible CryptoGErM scenario.
Introduction
Once the CryptoGErM project started, before moving our attention to
a broader perspective involving big data storage, we considered our
idea as a possible way of sending sensitive information in a very
secure way. Thanks to the combination of the Advanced Encryption
Standard algorithm and the biological security layers that make it
hard to get to the point in which it is possible to sequence the
DNA of a bacteria, CryptoGErM originally would have been the newest
method to use for sending highly reserved information. Most of the
research that has been done in the lab in order to make our system
as safe as possible, has been inspired by imagining possible
scenarios involving the potential users of CryptoGErM. "What might
happen if the agents wouldn’t have enough knowledge to successfully
make CryptoGErM work?" or "What could happen if one agent would be in
possession of enough sensitive information in order to make the
whole procedure fail?" are only two out of the multiple questions
that made it clear that in order to create an efficient product,
there was need to understand and imagine future scenarios involving
the possible users of the system. The two main agents that turned
out to be the most significant ones were a potential Sender agent,
that has as main goal hiding and sending as safely as possible a
top-secret message, and a potential Criminal, who wants to avoid
this scenario. Imagining all the possible interactions between
these two actors is of course impossible to achieve, however it is
possible to simulate part of their behaviours assuming they are
rational.
In order to do so we have been inspired by work [1] where
an Artificial Intelligence architecture based on Epistemic Logic is
proposed for modelling rational agents. This gave us the
opportunity to simulate different type of Agents and through the
use of a genetic algorithm explore the different features that
characterize their rationality. By exploring and computing the
optimal features we have been able to identify the core points that
have to be considered when building CryptoGErM in the most efficient
way as possible. The following sections explain how the
computational model has been built and trained in order to identify
the most important aspects of the A.I. architecture, and how this
knowledge has guided the whole lab work.
Architecture of the Model
The Agents have been modelled according to the BDI Architecture,
an epistemic logic based system proposed in [1] in 1991. The paper
states how important Beliefs, Desires and Intentions are in
determining the behaviour of rational agents, more in detail the
authors assert how every rational action taken from an agent is
basically based on these three components. The combination of these
features have an impact on the agent’s future, in fact they have
the ability to produce different actions and as a consequence give
rise to different future scenarios. These possible scenarios,
called Worlds, are modelled according to a temporal structure and
are defined as Time Trees, every moment in the time tree is the
result of an action based on the three previously mentioned BDI
features and is defined as a Situation. One Agent can move between
different time situations until it achieves its goal. The three BDI
components have a different impact on the agent’s behaviour and are
defined as follows:
Beliefs: are defined as those worlds that the agent
believes to be possible, technically every Belief is considered
as a complete and independent Time Tree since there are no
restrictions that can avoid an Agent to believe in something.
This feature is formalized as follows: ∀ w ∈ W. Where w
corresponds to a single time tree and W as the entire set of
beliefs.
Desires: are those worlds that correspond to the optimal
future scenario an agent wants to achieve, an agent can be
happy with multiple outcomes but usually the main Desire is a
single one and can be expressed by the following formalism: ∃!
w ∈ W.
Intentions: are considered as sets of intention-accessible
worlds. These worlds are ones that the agent has committed to
attempt to realize. Differently from the Beliefs, where all
possible worlds are considered, the Intention feature is
characterized by a strong component of realism since the worlds
that are part of this feature are only the ones that the Agent
thinks can be a possible Desire.
In addition to the three BDI features some extra parameters have
been added in order to model the agents. The reason this has been
done is related to the genetic algorithm that has been used in
order to simulate their behaviour and is explained into more detail
in the next section. The parameters that have been added to the
architecture are different according to the type of agent that has
been modelled, however the general idea is the same one for both of
them. In order to represent a potential level of Beliefs, Desires
and Intentions in a CryptoGErM scenario some extra features related to
this architecture have been added that might have an impact on all
three BDI components and as a consequence on the Agents’ actions.
Regarding a potential Sender the three additional features that
have been added are the following:
γ: which corresponds to a particular level of riskiness the
agent will take into account in his actions. The impact of this
feature is different on the three BDI components, in fact a
high level of riskiness makes the agent have a smaller set of
total Beliefs since there is no reason to think about all those
worlds that involve highly safe scenarios. On the other side it
is possible that a high level of γ leads to the optimal desired
world and as a consequence to a world the agent thinks requires
commitment.
ψ: is a variable that is related to the level of efficiency
that the agent wants to pursue in its actions. In this case the
higher this value is, the higher the value of the three BDI
components will be.
α: corresponds to the time resources the agent is willing
to invest when creating its BDI features, the impact of this
variable is similar to the previous one. The more time that is
invested by an agent, the higher the set of Beliefs will be and
the overall commitment to the possible worlds towards the
optimal scenario.
Regarding the Criminal Agent the meaning of ψ and α has been
changed. The first value corresponds to the amount of knowledge the
agent has about the different Time Trees while the second one is
related to the amount of motivation and effort he is willing to put
while interfering with the Sender’s plan.
The idea of the model was to identify which of the three BDI
components is considered as more important by a potential Sender
agent who wants to hide and send an encrypted message as safely as
possible and by a Criminal who wants to avoid this scenario. In
order to find this out a genetic algorithm has been used on
different randomly generated agents and the level of the three BDI
features has been measured once the algorithm converged to the
optimal result.
Genetic algorithms are a very common machine learning technique
that is used in different Artificial Intelligence optimization
problems [2][3][4], they are largely inspired by natural evolution
and work as follows: a random set of possible solutions is created,
each of these solutions goes through a fitness function which
measures how close these solutions perform to the optimal result
that wants to be achieved. It is important to notice that the
individual values that make it possible to converge to the optimal
result are not known beforehand, the only information that is known
is the final outcome, which can be achieved in multiple ways.
Once the agents are evaluated by the fitness function a
truncation process occurs, similarly as what happens in nature
where only the fittest animals survive, also in the simulation the
only agents that are allowed to be part and give rise to a new
generation are the ones with the higher fitness scores. How new
generations are created more into detail is presented in the next
subsection.
The Sender
An Agent is created as a string of binary bits and is
represented as a chromosome, each part of it represents the three
BDI features together with the three additional parameters
mentioned in the previous section. An example of a randomly
generated agent is the following:
Every binary pair corresponds to a decimal number that if
converted properly and summed together establishes the total
fitness value of the chromosome. For the sender Agent an optimal
value of 60 has been set, this turned out in the following fitness
function, where n is the total length of the chromosome and the
main aim is to find the optimal set of bits that satisfy it:
\[
f(x) = \sum\limits_{i=0}^{n} x[i] = 60
\]
For our Sender simulation the following parameters turned out to
be the most suitable ones while considering optimal convergence
performance and training time:
20 Simulations: a total number of twenty different
simulations have been ran in order to check if the results
where coherent between each other.
20 Pools of 100 Chromosomes: the simulation starts with 20
randomly generated chromosome pools with 100 agents each. This
is the first set of agents that is evaluated by the fitness
function and according to the fitness scores the breeding
process starts.
Truncation Method: is the approach that has been used in
order to keep the fittest agents allowed to reproduce
themselves, in order to avoid overfitting issues the top 70% of
the agents has been kept while the rest 30% has been
discarded.
One point single crossover: is the breeding method that has
been used. Once two chromosomes are chosen as parents they give
birth to two new children by exchanging 3 of their bits. This
technique is shown hereafter:
1% mutation rate: some randomness has to be added to the
newest generation in order to make the pool more varied and
avoid local minima, in our case a 1% mutation rate has been set
which means that one bit of one chromosome is randomly changed
from 0 to 1 or vice-versa.
Figures 1 and 2 show the performance of the genetic algorithm
and the analysis of the three different BDI features. These results
are explained more into detail in the Results section.
The Criminal
As already introduced previously some of the additional BDI
features have been changed while modelling the Criminal, however
this doesn’t have any influence on how the Agent is represented
which still corresponds to a string of binary bits. In order to
make the computational simulation more varied some changes have
been made to the genetic algorithm starting from the fitness
function which in this case has as optimal result a value of 65 as
shown by the following equation:
\[
f(x) = \sum\limits_{i=0}^{n} x[i] = 65
\]
Also in this case while training the Agents 20 different
simulations have been ran that started with 20 pools of 100
chromosomes each, however the truncation method kept the top 80% of
the Agents and not 70% as in the previous case and the breeding
technique that has been used was the n double crossover method. It
consists in exchanging not only one single set of bits but two of
them of 3 bits each as explained hereafter, the mutation rate has
been kept to 1%.
The aim of this section is twofold, on the one side it proves
how it is possible to model artificial agents through the use of a
genetic algorithm while on the other side it explains the relevance
of this multi-agent simulation in the CryptoGErM project and how the
relative results have been inspiring for the lab work.
Considering the first goal, as explained by Figures 1 and 3 it
is indeed possible to modify the structure of the Agents in order
to satisfy their relative fitness functions through the use of a
bio-computing technique. The structure of the string of bits
representing the Sender and the Criminal makes it possible to
modify the Agents in a relatively short training time by converging
to the optimal result already in 20 generations. These results show
how this particular machine learning technique can be used for
modelling Multi-Agent Systems and are in line with work [3].
However the most remarkable and useful results have been provided
by the final analysis of the different BDI features representing
the agents. The histograms show in fact a significant difference
that has lead the development of the whole CryptoGErM project.
Considering Figure 1, the results show how in the case of the
Sender model, the Belief feature is the one that the generations of
artificial agents discard in order to get to the optimal score. In
fact for 15 times as shown by the yellow histogram it is the BDI
component with the lowest value, 6. This is remarkable since it
shows how not too many resources should be invested in imagining
all possible scenarios when sending a secret message, but it is
better to have a strong set of Desires that nicely match with the
Intention feature and pursue a single one of those. This can nicely
be seen in the graph where an optimal value of 13 has been chosen
by the agents for the Desire and Intention features for more then
20 times. The same is not true for the Criminal model where in
order to be successful the generations of agents have to consider
as equally important all the three BDI components. This is
supported by the histograms that show how Beliefs, Desires and
Intentions occur almost equally with a value of 15.
Considering this simulation and the main goal of CryptoGErM that
sees a potential Sender communicating in the safest way as possible
a message to a Receiver and a relative Criminal with an opposite
goal, we deduced that in order to make life as hard as possible to
the Criminal we had to build the system in such a way that a
potential criminal has to consider as many Worlds as possible in
order to be successful. This lead us to the idea of adding besides
the A.E.S. method, when hiding the message into the DNA, a list of
biological inspired security layers that have to be considered if
the DNA has to be sequenced properly. As shown by the computational
simulation, the knowledge of these Worlds has to be known by the
Criminal if he wants to be successful since considering the A.I.
architecture it corresponds to the Belief feature.
Finally, it is important to mention that the computational model
doesn’t allow the agents to take real actions and maybe compete
between each other, however it still provides important insights by
considering the features that make one agent as effective as
possible by assuming it is rational as defined by the BDI
architecture.
[1] Rao, A. S., & Georgeff, M. P. (1991). Modeling rational agents within a BDI-architecture. KR, 91, 473-484.
[2] Herrera, F., & Verdegay, J. L. (1996). Genetic algorithms and soft computing. Physica-Verlag.
[3] Zhong, W., Liu, J., Xue, M., & Jiao, L. (2004). A multiagent genetic algorithm for global numerical optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 34(2), 1128-1141.
[4] Goldberg, D. E., & Holland, J. H. (1988). Genetic algorithms and machine learning. Machine learning, 3(2), 95-99.
Mutation rates during the CryptoGErM process
We present an estimate of the number of bases expected to mutate
during the whole process of insertion, sporulation, sending,
treatment, germination and reading, considering the mutation rates
known to Bacillus subtillis.
Reported mutation rates of Bacillus subtilis
A mutation in the genome of B. subtilis can occur in a coding or
noncoding DNA sequence. If it occurs in a coding part, it can be an
evolutionary advantage or a disadvantage. When it is a
disadvantage, the bacteria will have less chance of passing its
mutated genes to the next generation. Thus, some reports that
measure the mutation frequency under evolutionary pressure
conditions may underestimate its real value. As our message is a
noncoding sequence, it will not be an evolutionary disadvantage (or
an advantage) if it mutates or gets lost.
Maughan et al. (2006) measured the mutation rates of B. subtilis in
an antibiotic-resistance gene under conditions that did not require
such a gene to survive [3]. They also found that the mutation rate
itself may increase as generations pass by until it reaches a
plateau. Nonetheless, the necessary number of generations to see a
difference in mutation rates is higher than those our methodology
requires. For that reason, the value of \(0.5 \cdot 10^{-8}
\frac{mutations}{bp \cdot generation}\) will serve as a good estimate.
Number of generations
The probability that our message is changed by a random mutation
also depends on the number of generations that have passed since
the insertion until the sequencing. The doubling time of
B. subtilis is estimated to be 120.1 min[1] but depending
on the growth phase this value may vary. For instance, Maughan
(2006) reports that the number of generations that passed in one
week was 500 (doubling time = 201 min)[3].
Step
Time (h)
Generations required
Message insertion – sporulation
72
36
Sporulation – sending
0
0
Sending – germination
0
1
Germination – message reading
24 – 48
12 – 24
TOTAL:
~50
DNA damage by UV radiation
We planned to use a photoswitchable antibiotic as one of our
bio-layers of security. Due to some experimental issues explained
in the respective section, we are not using that approach anymore.
However, we had considered that it might increase the mutation
frequency to a considerable extent.
First, our calculations (shown in the respective section)
indicated that the spores would survive the required UV treatment
required for antibiotic activation (325 nm, 30 min, 28.8
kJ/m2) and experimentally we did find them to survive
it.
In general, spores are 10-50 times more resistant to UV damage
than vegetative cells and spores during the germination phase
undergo a transcient period in which resistance to UV damage is
even higher than that of normal spores. Spores also have a very
interesting way of avoiding UV damage that is advantageous for our
system, having an almost free-error repair system[2].
Tanooka (1978) reports that upon irradiation of B. subtilis with
UV (254 nm, 1-100 J/m2) the mutation frequency increases
in a linearly on a log-log scale, being the spores approximately 10
times more resistant than vegetative cells[5].
Calculation
As the UV treatment is not used anymore, it is not taken into
account in this calculation.
Thus, each line of B. subtilis will have around one mutation by the
time we want to read the message. This mutation does not have to be
inside our message, and does not have to be the same bp mutated for
each cell line as explained in the next diagram.
References
[1] Burdett (1986). Growth kinetics of individual Bacillus subtilis cells and correlation with Nucleoid extension. Journal of Bacteriology, 167(1):219-230
[2] Lenhart (2012). DNA repair and genome maintenance in bacillus subtilis. Microbiology and Molecular Biology Reviews, 76(3):530-564
[3] Maughan (2006). The population genetics of phenotypic deterioration in experimental populations of Bacillus subtilis. Evolution 60(4): 686-695
[4] Moeller (2014). Resistance of Bacillus subtilis spore DNA to lethal Ionizing radiation relies primarily on spore core components and DNA repair, with minor effects of Oxygen Radical Detoxification. Applied and Environmental Microbiology, 80(1):104-109
[5] Tanooka (1978). Mutation induction with UV and X-Radiations in spores and vegetative cells of Bacillus subtilis. Mutation Research, 49:179-186
Decoding – costs and fidelity
The cost of sequencing DNA has plummeted in the last two
decades. For instance, in the early 2000’s it took 13 years and $3
billion US dollars to sequence the entire human genome. With
current technologies, we have approached the $1000 dollars mark. In
fact, since 2007 you can have your whole DNA sequenced for less than
that! However, there are still issues with the fidelity and
accuracy of the readings obtained that would prevent them from
being used for our system[5].
Existing laboratory-level DNA sequencing technologies typically
allow for a reading error of ~1%. Thanks to optimization and
finne-tunning, traditional readings using Sanger biochemistry offer
now accuracies of up to 99.999% in a read length of 1,000 bp. With
modern, second generation or cyclic sequencing, higher reading
lengths have been achieved but with a decrement in accuracy[4].
Among the bioencryption layers, CryptoGERM prevents unauthorized
parties from reading the message by having a high ratio of decoy
spores that contain a useless sequence. In addition, if the right
growing conditions are not supplied our system prevents germination
and replication of spores that do contain the message. So, what
ratio of decoy:spores do I need to prevent brute sequencing and
message retrieval from a third party?
Fox et. al. (2014) reports that there is a 50% chance of
accurately distinguishing a true subclonal variant from a
sequencing artifact in an excess of 100 wild-type DNA sequences
using standard Q30 filter reads (error rate: 10-2)[1]. In
agreement with that result, in an experiment carried out to detect
genomic variations in marine pests, Pochon (2013) was able to
detect one variant out of 150 wild-type sequences[2].
Reading DNA has become increasingly more accurate. Schmidt
(2012) developed a method called Duplex Sequencing that uses both
strands of DNA to obtain a more precise consensus sequence yielding
an theoretical error of 3x10-10[3]. That means that we
could transmit a message with a length in the order of Gigabytes
without expecting any lost! On the other hand, that also means that
they allow a more precise measurement in decoy-spores mixtures, and
a 1:150 spore:decoy ratio might be insufficient in the future. In
fact, using Duplex Sequencing they were able to identify one mutant
sequence per 10,000 wild-type molecules.
References
[1] Fox EJ, Reid-Bayliss KS, Emond MJ, Loeb LA (2014) Accuracy
of Next Generation Sequencing Platforms. Next Generat Sequenc
& Applic 1: 106. doi:10.4172/jngsa.1000106
[2] Pochon (2013). Evaluating detection limits of
next-generation sequencing for the surveillance and monitoring
of international marine pests. PLos One 8(9):e73935
[3] Schmitt MW, Kennedy SR, Salk JJ, Fox EJ, Hiatt JB, et al.
(2012) Detection of ultra-rare mutations by next-generation
sequencing. Proc Natl Acad Sci U S A 109: 14508-14513.
[4] Wang, X., Blades, N., Ding, J., Sultana, R. and Parmigiani,
G. (2012). Estimation of sequencing error rates in short reads.
BMC Bioinformatics, 13:185
[5] Zhu, X., Wang, J., Peng, B. and Shete, S. (2016). Empirical
estimation of sequencing error rates using smoothing splines.
BMC Bioinformatics, 17:177
Random Mutation
Special attention has been given during the modelling to the
issue of random mutations. This particular phenomenon is related to
the random modification of some of the ACTG components of the
genome. Even though the reasons why this happens are unknown, the
impact that these switches have on the DNA of Bacillus subtilis
are very relevant in the CryptoGErM project. This is mainly related
to the computational reasons that govern the encryption process of
the data that is saved into the bacteria, in fact if even one
single DNA component mutates it will be impossible to decrypt the
message and get the original message back. The same is also true
for the key, if one bit doesn’t correspond to the original ones,
the key is not going to work on the encrypted message. In order to
be sure that the message that we have stored into the genome
wouldn’t have been affected by a random mutation we have ran some
computational simulations that reproduce this phenomenon.
The original length of Bacillus subtilis DNA corresponds to
4.215.619 base pairs, on the other side the length of the Sherlock
Holmes quote that we have stored in the bacteria has a total length
of 556 base pairs. The computational model simulates the random
mutation in the DNA by randomly changing 1% of the ACTG components
and checks if one of these mutations affects the inserted message.
We have ran 1000 different simulations where this process has been
computed iteratively and in none of these cases the random
mutations affected the message. From this we concluded that for our
specific case the impact of the random mutations influencing our
project was extremely low and that we could have safely saved
Arthur Conan Doyle’s quote.
We have also asked ourselves how many of these quotes we could
have theoretically been able to store safely into the bacteria, in
order to do so we have modified the computational model where we
recursively increased the size of the message at every simulation
according to the following equation:
\[
f(i) = 556 \prod_{i=1}^N i
\]
556 corresponds to the length of the original message which is
recursively increased according to the value of N, the simulations
show that the first 2 random mutations affecting the message occur
when N reaches a value of 6. This corresponds to a 400.320 base
pair long message. This result provides interesting insights to
future research that aims to store always bigger amounts of data
into the DNA of bacteria, more in detail it corresponds to the
maximum amount of base pairs that can be modified in Bacillus
subtilis.
A Python implementation of the model is available at dna_analysis.py.
The script requires Python 2.7 and the random,
copy and numpy libraries. Also make sure
to download and place BacillusSubtilis.txt
in the same directory.
Choosing optimal strain for photoswitchable antibiotic
As can be read in the photoswitchable
antibiotics section, the idea with the photoswitchable antibiotic
was that it would kill the decoy B. subtilis but would not cause any
harm to the engineered strain containing the DNA sequence.
When not treated properly with the right wavelength of UV radiation
(see figure 1), spirofloxacin remains in its inactive state. Because
the sample sent to the receiver contains hundreds to thousands of times
more decoy spores, it cannot be sequenced directly by the recipient
(see also Decoding
Fidelity). Instead, the receiver must activate the antibiotic. Once
done, the spirofloxacin-resistant message-containing bacteria would
outnumber the decoys.
We explored the activity of spirofloxacin on different strains of
E. coli, since it was a species of bacteria whose
susceptibility to spirofloxacin had been previously well measured [7] (see figure 2).
At this point of the lab work, experiments trying to measure the MIC
of spirofloxacin “on”/”off” against B. subtilis had been inconsistent.
We thought of Molecular Dynamics (MD) studies as a good alternative to
measure how well the antibiotic performs on Bacillus, thus allowing us
to continue faster with the engineering of the resistant strain.
It is important for our system that the ratio of resistance to
susceptibility in engineered and wild type is optimal. That means that
in its inactive state, spirofloxacin must not have a high bactericidal
activity and when activated, it must be potent enough to kill the
wild-type cells while the engineered strain will survive.
First, we needed to find a suitable crystallographic structure of a
type-II topoisomerase (a protein-DNA complex) bound to a
fluoroquinolone. Topoisomerase IV is the main fluoroquinolone’s target
in Gram-positive bacteria, while gyrase (the other type of
topoisomerase II) is the main target in Gram-negatives [1]. The
crystallographic structure recently reported by Veselkov et. al. (2016)
offered a good alternative as it has the two fluoroquinolone molecules
bound to the protein-DNA complex.
Secondly, we needed to create and adapt an appropriate force-field
that would reproduce the behavior of the binding process at a
reasonable computational cost. A force-field is a set of parameters
that tells the software (we used GROMACS v5.0.4) how each atom behaves
and interacts with others. As reported elsewhere [1][2] the binding
process involves cation- and hydrogen-bonding interactions that can
only be reproduced in the atomistic level. However, the computational
cost of simulating a 150 kDa protein (embedded in a solvated box) is
high enough to consider using lower-resolution scales. In addition, the
crystallographic structure already is a bound topoisomerase –
antibiotic complex, so there is no need for that high level of
resolution. Coarse-grained models are refined enough to offer insights
into the affinity of protein-ligand interactions (see i.e. [4]). Some
of the molecular parameters for the spiropyran part of the molecule
(the part that gives it is photoswitchable behavior) had been studied
elsewhere [9]; nevertheless, its parameterization into atomistic and
further coarse-grained requires some extra work.
While the MARTINI force-field has been adapted and optimized for
both proteins and DNA [3][6] it cannot reproduce cation- and
hydrogen-mediated binding of ligands. So our best method is to use
umbrella sampling in which basically the protein and the ligand are
artificially placed in its “correct” orientation and dragged away from
each other, measuring the Potential of Mean Force (PMF) [5].
Before actually starting the umbrella simulations and due to the
lack of consistent experimental results, we decided to change the
photoswitchable antibiotic approach in our project. However, for the
sake of clarity in the next figure we show what we expected to obtain
from the simulations.
Figure 4. In the umbrella sampling method the
antibiotic is pulled away from the protein-DNA complex as marked by
the arrow. As a result, we expected to obtain a profile as shown in
the right side in which the spirofloxacin shows two binding
affinity’s profiles. (Molecule shown, PDB code: 3RAE)
References
[1] Aldred, K., Kerns, R. and Osheroff, N. (2014). Mechanism of Quinolone Action and Resitance. Biochemistry, 53, 1565-1574
[2] Lupala, C., Gomez-Gutierrez, P. and Perez, J. (2013). Molecular Determinants of the Bacterial Resistance to Fluoroquinolones: A Computational Study. Current Computer-Aided Drug Design, 9: 281-288
[3] Monticelli, L., Kandasamy, S., Periole, X., Larson, R., Tieleman, P., and Marrink, S. (2008). The MARTINI Coarse-Grained Force Field: Extension to Proteins. J. Chem. Theory and Comput., 4(5): 819-834
[4] Naughton, F., Kalli, A. and Sansom, M. (2016). Association of Peripheral Membrane Proteins with Membranes: Free Energy of Bingind of GRP1 PH Domain with Phosphatidylinositol Phosphate-Containing Model Membranes. J. Phys. Chem. Lett., 7(7): 1219-1224
[5] Torrie, G. and Valleau, J. (1977). Nonphysical sampling distributions in Monte Carlo free-energy estimation: Umbrella sampling. J. Comput. Phys., 23(2): 187-199
[6] Uusitalo, J., Ingólfsson, H., Akhshi, P., Tieleman, P., Marrink, S. (2015). Martini Coarse-Grained Force Field: Extension to DNA. J. Chem. Theory Comput., 11(8): 3932-3945
[7] Velema, W., Hansen, M., Lerch, M., Driessen, A., Szymanski, W. and Feringa, B. (2015). Ciprofloxacin-Photoswitch Conjugates: A Facile Strategy for Photopharmacology. Bioconjugate Chemistry, 26: 2592-2597
[8] Veselkov, D., Lapanogov, I., Pan, X., Selvarajah, J., Skamrova, G., Branstrom, A., Prasad, L., Fisher, M. and Sanderson, M. (2016). Structure of a quinolone-stabilized cleavage complex of topoisomerase IV from Klebsiella pneumoniae and comparison with a related Streptococcus pneumoniae complex. Acta Cryst. D72: 488-496
[9] Zhai, GH., Yang, P., Wu, SM., Lei, YB. and Dou, YS. (2014). A semiclassical molecular dynamics of the photochromic ring-opening reaction of spiropyran. Chinese Chemical Letters, 25: 727-731