Team:Marburg/Modeling

SynDustry Fuse. Produce. Use.

Quantitative evolutionary stability analysis of kill switches

Abstract

Well constructed kill switches enable to safely keep genetically modified organisms in fermentation setups for long times. We develop a genetic algorithm based tool to quantify a kill switch design in terms of its evolutionary stability. A design providing redundant toxin regulation is found to reduce organism escape probabilities significantly. Thus, our advice to future teams is to incorporate redundant toxin regulation. We collected all kill switches ever implemented in iGEM in an online database. Our tool can be used by future iGEM teams to study their or already existing kill switch designs. Our collaboration deals with the experimental evidence of our theoretical predictions.

Kill Switch Database

Introduction

The fact that Synthetic Biology not only opens doors to a new era of science but also threatens humankind and the ecosystem we live in is widely known. The impact of genetically modified organisms (GMOs) in nature can not be foreseen and are practically irreversible once having been in contact with nature.

Genetic constructs that kill GMOs on purpose once they escaped the lab environment unintentionally are a great way to reduce the threat through these organisms [1]. These constructs are called kill switches in analogy to their industrial equivalent (figure 1). It might be implemented into an otherwise already genetically modified organism. Once the organism escapes the lab, it will die. This way, it cannot harm the environment and applications of Synthetic Biology like our artificial endosymbiosis can be scaled safely to industrial level.

Figure 1
Figure 1. The kill switch analogy in industrial applications. It is pressed once an emergency happens so that the machines need to stop immediately.

However, one has to keep in mind that all organisms underlie natural occurring mutations. Therefore, the kill switch might be destroyed through mutations while the organism is under lab conditions. Subsequently, this problem causes that the GMO might threatens nature due to its survival after an accidental escape.

A kill switch is a genetic regulatory network. Such a network is built from biological components such as promoters and genes which are interlinked with each other. The network’s structure – also called its topology – plays a crucial role for kill switches: Weakly design kill switch topologies are prone to destruction through mutations. Therefore, a GMO's safety classification depends on its kill switch topology. This raises the need for a tool to quantify a kill switch topology's robustness against mutation. Such a tool does not only provide help for a biologist designing a GMO on how to design a specific kill switch but it can be also used to derive general kill switch design guidelines.

In the following, we will describe how the escape probability of a kill switch is computed. The escape probability is defined as the probability that a kill switch is destroyed due to mutations during lab conditions such that the organism survives in wild life conditions.

Genetic Regulatory Networks

Genetic regulatory networks (GRN) describe how genes are expressed inside an organism [2]. This is comprehensively summarized in wiring diagrams. Taking electrical circuits as comparison, the electrical current is equivalent to the flow of gene expression in GRNs. Genetic promoters represent nodes with different logical behaviors that lead to gene expression and therefore proteins being built. Subsequently, these proteins control promoters again such that complex behaviors arise.

A common way to put this into a context so that quantitative statements are possible is to use modeling of GRNs based on ordinary differential equations (ODEs) [3].

First, a vector of all involved substances is defined \[\vec{X}=\sum_{i=1}^{n} X_i\cdot\vec{e}_i\] with \(X_i\) being the concentration of the \(i\)th substance. Substances span from mRNA over proteins to intermediate complexes and finally the toxins. This vector \(\vec{X}\) depends on the granularity of the model and the precise biological processes (for instance dimerization, cooperativity or other higher order reactions).

The time derivative of the vector \(\vec{X}\) is then formulated as a continuous function that might depend on the concentrations of all involved substances \[\frac{d}{dt}\vec{X}(t)=\vec{f}(\vec{X},t,k)\] where \(t\) is a given point in time and \(k=(k_i)_i\) encapsulates all constant parameters used in the time derivative. The constant parameters \(k_i\) are of very high relevance: They are used to describe the biological processes and determine if the mathematical model in return resembles the involved biology.

Continuous Genetic Regulatory Network Modeling

In the following, the function \(\vec{f}\) is explained for different biological phenomena. It captures how the concentrations \(\vec{X}\) influences its own change over time.

In table 2, a summary of biological processes that are translated into their mathematical modeling equivalence is provided. Further reading on that topic can be found in [4]. The corresponding definitions are listed in table 1. This is used extensively in this work in order to build appropriate systems of ODEs.

Definitions:

Symbol Meaning
\(S\) Concentration of a substance.
\(k_i\) Parameter describing the process(es).

Table 1. Definitions for the used symbols.

Processes used are:

Name Process Explanation Equation
Degradation \(S \xrightarrow[]{k_{deg}} \emptyset\) Degradation of a substance. \(\frac{d}{dt} S=-k_{deg}\cdot S\)
Transcription \(S_{DNA}\xrightarrow[]{k_{transc}} S_{DNA}+S_{mRNA}\) DNA transcription: mRNA creation from DNA. \(\frac{d}{dt} S_{mRNA}=+k_{transcr}\cdot S_{DNA}\)
Translation \(S_{mRNA}\xrightarrow[]{k_{translation}} S_{mRNA}+S_{Protein}\) mRNA translation: Protein creation from mRNA. \(\frac{d}{dt} S_{Protein}=+k_{translation}\cdot S_{mRNA}\)
Dimerization \(2 S \xleftarrow[]{k_{dis}} \xrightarrow[]{k_{dim}} S_2\) One substance dimerizes. The reaction happens in both directions. \(\frac{d}{dt} S=-2\cdot k_{dim}\cdot S^2+2\cdot k_{dis}\cdot S_2\) \(\frac{d}{dt} S_2=+k_{dim}\cdot S^2-k_{dis}\cdot S_2\)
Complex formation \(S_i + S_j \xleftarrow[]{k_{dis}} \xrightarrow[]{k_{dim}} S_i S_j \) Two substances react with each other. The reaction happens in both directions. \(\frac{d}{dt} S_i=+k_{dis}\cdot S_i\cdot S_j-k_{dim}\cdot S_i\cdot S_j\) \(\frac{d}{dt} S_i=+k_{dis}\cdot S_i\cdot S_j-k_{dim}\cdot S_i\cdot S_j\) \(\frac{d}{dt} S_i S_j = -k_{dis}\cdot S_i S_j+k_{dim}\cdot S_i\cdot S_j\)
Repression \(S_{DNA}\xrightarrow[]{\perp} S_{DNA}+S_{mRNA}\) One substance is repressed by another substance. \(\frac{d}{dt} S_i=\frac{+k_{maximal}}{1+S_j/k_{regulation}}\)
Double repression \(S_{DNA}\xrightarrow[]{\perp,\perp} S_{DNA}+S_{mRNA}\) One substance is repressed by two other substances. \(\frac{d}{dt} S_i=+\frac{k_{i,maximal}}{1+S_j/k_{j,regulation}}\cdot\frac{1}{1+S_k/k_{k,regulation}}\)

Table 2. Biological processes and their modeling approaches. Including a short explanation and the mathematical equation.

These basic reaction rates can be derived rigorously from stoichiometric reaction networks [5]. The constant ODE parameters used in our equations are summarized in \(k\).

Modeled Kill Switches

(a) BNU China 2014

The kill switch of BNU China’s iGEM team in 2014 [6] was chosen to be modeled. Topologically, it is a simple cascade of repressing units. We investigated the involved biological processes further and updated the network’s topology accordingly to resemble the biological structure correctly. This includes dimerization and complex formations through higher order reactions. The network topology given by BNU China 2014 and our reworked topology based on a deeper understanding is shown in figure 2.

Figure 2
Figure 2. (a) BNU China and (b) our more detailed rework that includes biologically relevant processes.

The inducer is IPTG and controls the toxin protein MazF. We introduce the dimerization of LacI and CI such that the dimers interact with the promoters. Furthermore, we modeled the repression through IPTG by a reversible second order reaction.

With these adaptations, the ODE system of BNU China’s iGEM team 2014 is a 10 dimensional vector that requires 23 constant parameters \(k\). The parameters have been chosen to be in accordance with biological processes. Amongst others, they were taken from [7]. For the notation of concentrations, we use the letters and spared the square brackets.

This kill switch is the starting point for other network topologies that have been studied. These are:

  • Parallel (OR) regulation
  • Serial regulation
  • Parallel (AND) regulation

(b) Parallel (OR) Regulation

While the previous kill switch consists of a serial cascade, another generic topology is two parallel cascades regulating the same toxin. These two cascades regulate the promoter in a OR gate manner: The toxin’s promoter is repressed if one inducer is present.

Each cascade consists of the BNU China 2014 kill switch explained above. We take a step further from specific biological substances to abstract substances that behave like their biological equivalences: IPTG becomes I, lacI (LacI) becomes aa/ab (Aa/Ab), cI (CI) becomes ba/bb (Ba/Bb) and finally mazF (MazF) becomes c (C). This way, we focused on the topology of the kill switches. The topology can be found in figure 3.

Figure 3
Figure 3. The Parallel (OR) topology.

This topology is captured by a 19 dimensional concentration vector that requires 42 parameters \(k\). Since the overall mutation probability depends on the number of parameters, this kill switch is not comparable to the original BNU China 2014 kill switch. For that reason, we introduce a longer cascade in the following that is comparable to this Parallel (OR) regulation.

(c) Serial Regulation

This serial kill switch introduces two chained BNU China 2014 kill switches (figure 2b). That feature increases the number of parameters to 39. Therefore, it is comparable to the Parallel (OR) kill switch.

Figure 4
Figure 4. The serial topology.

(d) Parallel (AND) Regulation

The last kill switch is similar to the Parallel (OR) topology. It differs in the sense that the two branches are now connected by an AND gate: Both regulating branches must be active in order to repress the toxin’s promoter. Each branch is equivalent to the BNU China 2014 kill switch. The number of parameters is 41 and the concentration vector is 18 dimensional. Therefore, it is also comparable to the kill switches (b) and (c).

Figure 5
Figure 5. The Parallel (AND) topology.

Further details and the derivations of the ODE systems including rigorous scientific documentation is available upon request (Martin Lellep, martin.lellep@physik.uni-marburg.de).

General Genetic Algorithms

General genetic algorithms are a population-based meta-heuristic approach to optimize a complex target function [8]. The question at hand is in this case: Given a large input parameter space, how to choose a configuration that yields a maximal value in terms of a certain measure function.

Genetic algorithms utilize the simulation of biological evolution. A specific configuration from the large input parameter space is called individual and its value in terms of the chosen measure function is defined as its fitness value. An individual has a representation that can be mutated to obtain a new individual with a different fitness. A set of \(N\) individuals is called a population of size \(N\).

A population is simulated over the course of time for a number of generations \(g\). After each generation, evolutionary processes are applied. These comprise mutation of an individual, parental selection from the population – based on the individuals’ fitness values – and individual pairing. The so generated offspring replaces the previous generation and forms the next generation.

Thus, genetic algorithms resemble evolution through mutation and selection pressure in order to find the fittest individual.

Adapted Self-Developed Genetic Algorithm

We use a modified genetic algorithm with fixed population size. It computes the escape probability of individuals. These individuals implement a kill switch which might be destroyed through mutations. The kill switches are modeled as genetic regulatory networks. The major difference is that it is not used for an optimization task but rather for the evolution of the population. Below, the algorithm is comprehensively summarized in figure 6.

Figure 6
Figure 6. The self-developed, adapted genetic algorithm we use to compute the escape probability.

The processes are as follows:

  • Individual: An organism that has the kill switch of interest implemented. Since using genetic regulatory networks, the kill switch is a system of ODEs including the biologically relevant parameters \(k\). These parameters are initialized with the values from the non-mutated kill switch and lie in the biologically relevant regime.
  • Population: \(N=1000\) individuals that are simultaneously simulated.
  • Fitness: The toxin expression compared to the toxin threshold \(\theta\). An individual’s fitness is between 0 and 100. It implements a soft criterion around the toxin threshold.
  • Survivor selection: Since a hard threshold test of \(X_{\mbox{toxin}}>0\) does not resemble the biological situation sufficiently, we implement a soft criterion. The death probability \(p_{\mbox{dead}}(X_{\mbox{toxin}},\theta)\) compares the actual toxin concentration to the toxin threshold and allows for fluctuations around the toxin threshold \(\theta\).
  • Mutation: Given a mutation rate per ODE parameter \(\eta\), the overall mutation rate is computed by \(\eta_{\mbox{overall}}=\eta\cdot\left|k\right|\) with \(\left|k\right|\) as the number of parameters used in the ODE model of the kill switch of interest. With a probability of \(\eta_{\mbox{overall}}\), a mutation occurs: One ODE parameter \(k_i\) from the set of all parameters \(k\) is chosen randomly. Subsequently, the mutated parameter is computed by \(k_i'=mutate(k_i)\) with \(mutate\) modifying \(k_i\) in a way that is motivated by individual binding strengths of proteins on DNA.
  • Parental selection: The parents who replace the dead individuals are selected proportional to their fitness values [9]. The individuals are assigned intervals between 0 and 1 proportional to their fitness. Subsequently, a random process will most likely pick the fittest individuals but still does not hinder weaker individuals to be picked.
  • Transition to next time step: After the mutation process has been performed, taking the last time step’s concentrations \(\vec{X}(t_{i-1})\) and time \(t_{i-1}\) as the initial value, the integration of the ODE system is performed. The integration is executed until a steady state in all substances’ concentrations is reached: These concentrations are considered as \(\vec{X}(t_{i})\) in time step \(t_{i-1}\). This step is the computationally most expensive one. Unit tests verify a successful integration routine.

The algorithm takes various input parameters: The kill switch to simulate, the population size \(N\), the mutation rate per parameter \(\eta\) and the generations in incubation \(g\). The major result is the escape probability: The probability that an individual survives a release into wild life even though the kill switch should have killed it. Since mutations gradually change the kill switch’s parameters over the incubation time, the escape probability is a measure for how robust the kill switch’s topology is in terms of evolutionary stability.

Implementation

The computationally most expensive step is solving the differential equation for \(\vec{X}(t)\) for all substances and all \(N\) individuals at each time point. Each simulation is executed multiple times to obtain statistical relevance.

The algorithm including all numerics has been implemented in Python [10] to run on multiple CPU cores in parallel. The systems of ODEs are generally not solvable analytically. Thus, we have used a stiff ODE integrator to compute the solutions [11].

The full code including extensive documentation, further optimizations and a user-friendly framework is available upon request (Martin Lellep, martin.lellep@physik.uni-marburg.de).

Kill Switch Database

The tool we develop can be applied to arbitrary kill switches. In order to provide a good starting points for future iGEM teams to quantify kill switches with our tool, we created a database of all kill switches ever implemented in iGEM since 2004.

This database comprises all the years and teams that implemented a kill switch. For each kill switch we summarized the used parts, toxins and their antagonists. In particular, the included pictures of the kill switch topologies will safe future teams a lot of work. Find the kill switch database here:

Summary

We introduced further steps in the general genetic algorithm to adapt it to our needs. Not the individual with the highest fitness is computed but rather the escape probability of individuals after incubation time due to mutations. We accomplished this by using an integration of ODEs to update the population from one to another time step.

The results section will present the simulated kill switches' dynamics, incubation time rescaling by the mutation rates and the evolutionary stability algorithm's findings in terms of escape probabilities. We provide general statements on how a kill switch topology should be designed for future constructs.


Literature

  1. [1] Moe-Behrens, Gerd HG, Rene Davis, and Karmella A. Haynes. "Preparing synthetic biology for the world." Synthetic biology applications in industrial microbiology (2014): 77.
  2. [2] Jacob, François, and Jacques Monod. "Genetic regulatory mechanisms in the synthesis of proteins." Journal of molecular biology 3.3 (1961): 318-356.
  3. [3] Karlebach, Guy, and Ron Shamir. "Modelling and analysis of gene regulatory networks." Nature Reviews Molecular Cell Biology 9.10 (2008): 770-780.
  4. [4] Bintu, Lacramioara, et al. "Transcriptional regulation by the numbers: models." Current opinion in genetics & development 15.2 (2005): 116-124.
  5. [5] Heinrich, Reinhart, and Stefan Schuster. The regulation of cellular systems. Springer Science & Business Media, 2012.
  6. [6] https://2014.igem.org/Team:BNU-China/kill
  7. [7] Fritz, Georg. "A genetic circuit for epigenetic memory." arXiv preprint q-bio.MN/0701011.
  8. [8] Goldberg, David E. Genetic algorithms. Pearson Education India, 2006.
  9. [9] Back, Thomas. Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford university press, 1996.
  10. [10] https://www.python.org/
  11. [11] https://computation.llnl.gov/casc/odepack/odepack_home.html