# Team:Tsinghua-A

Team:Tsinghua-A - 2016.igem.org

# Abstract

When information flows through gene-regulatory networks, noise is introduced, and fidelity suffers. A cell unable to correctly infer the environment signals from the noisy inputs may be hard to make right responses. What factors can possibly alter the circuit's ability to sense extrinsic information?

There are many “parallel circuits” in gene-regulatory networks where independently transcribed monomers assemble into functional complexes for downstream regulation. We wonder if those frequent dimerization designs are fundamental in improving the circuit properties. In addition, would higher affinity between monomers boost the quality as well?

Inspired by these thoughts, we construct synthetic biology circuit by using split florescent proteins. And we add inteins to split florescent proteins to the make the binding tighter. Then, we quantitatively measure the capacity of these information channels aided by information theory. Computation and wet lab work are combined to optimize our understanding of such systems, and to interpret potential biological significance of reoccurring parallel designs in nature.

# introductory Story

Let us tell a simple but interesting story to help you understand what our project is about and also relax yourselves before you look into our project.I believe all of you know about a term called binaural effect which means you can judge where a thing is with less deviations when you use both of your ears. Similarly, here is the story.

Li lei is an undergraduate from Tsinghua University and Han Meimei is an undergraduate from Peking University. They met and knew each other when they attended the Jamboree of iGEM. After that they always communicated through WeChat and fell in love with each other. So Li lei often called Han Meimei to make sure that how she is going and wanted to make her feel happy. But he had a problem. The telephone line was very noisy and Han Meimei could not hear what he said clearly every time he called her. He knew that and he was very depressed because he worried about that Han Meimei would be angry with him. One day he came up with a good idea and he bought two. Since then he bought two amplifiers and used them both when he called her. Now Han Meimei can hear him clearly and this tells us that when you use two paths other than one to transmit your messages, the noises will be reduced observably. And now with the inspiration, come and see our project.

# From information theory to our project

As we can learn from Wikipedia, information theory studies the quantification, storage, and communication of information which was originally proposed by Claude E. Shannon in 1948. The theory has developed amazingly and has found applications in many areas. It’s not exaggerated to say that we can see the power of information theory all the time.

Based on this, we would like to explore something originally in biological pathway using information theory.

Our project has a close connection with two terms in information theory, mutual Information and channel capacity. And what are they?

In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" (in units such as bits) obtained about one random variable, through the other random variable. The concept of mutual information is intricately linked to that of entropy of a random variable, a fundamental notion in information theory, that defines the "amount of information" held in a random variable. You can understand it trough from the figure below.

Formally, the mutual information of two discrete random variables X and Y can be defined as:$I\left( {X;Y} \right) = \sum\limits_{y \in Y} {\sum\limits_{x \in X} {p\left( {x,y} \right)\log \left( {\frac{{p\left( {x,y} \right)}}{{p\left( x \right)p\left( y \right)}}} \right)} }$

And the formula can also be proofed:$I\left( {X;Y} \right) = H\left( X \right) - H\left( {X\left| Y \right.} \right) = H\left( Y \right) - H\left( {Y\left| X \right.} \right)$

From the formula you can easily think that mutual information is the reduction of uncertainty in $X$ when you know $Y$.

And when you understand the mutual information, you can easily understand what the channel capacity is.

In electrical engineering, computer science and information theory, channel capacity is the tight upper bound on the rate at which information can be reliably transmitted over a communications channel. And by the noisy-channel coding theorem, the channel capacity of a given channel is the limiting information rate (in units of information per unit time) that can be achieved with arbitrarily small error probability. And you can also see the figure below.

The channel capacity is defined as:$C = \mathop {\sup }\limits_{{p_X}\left( x \right)} I\left( {X;Y} \right)$

where the supremum is taken over all possible choices of ${p_X}\left( x \right)$.

After you know about all the concept above, now we are glad to tell you that you can easily follow us and find out many interesting and inspiring things in our project. Congratulations!

# Protein Dimerization and Splitting Up

Dimerization is only too common in cells. Monomers assembly into dimers for further functions all the time, some interactions strong, some interactions weak. Function-less newborn peptides piece together and get to work, forming so-called tertiary structure; activated kinases reach each other and mutually phosphorylate; transcription factors, when forming homo- or hetero-dimers according to different stoichiometry, leads to varied downstream responses and distinct cellular fates…

Previous researches have underlined the important advantages of dimerization, including differential regulation, specificity, facilitated proximity and so on. Wait, this cannot be the end of the list. What is the influence of dimerization in noise propagation? The question is hardly touched due to the difficulty in controlling experiment variables. Synthetic biology provides powerful tools to carry out experiments otherwise impossible in designed systems.

For synthetic biologists, it is crucial and challenging to construct AND gate. Split up a regulatory protein such as transcription factor, express two halves independently, and an AND gate is born.

Nonetheless, the act of splitting up can bring about unexpected side effects. Gene regulatory circuits are highly dependent on quantitative properties, its complexity and nonlinearity contributing to hard-to-predict behaviors of biological systems. Once an important part in the system is chopped up, who knows what will happen next?

These are the thrilling challenges and opportunities we face in our program. We split up florescent protein and tune the affinity between them by adding intein sequence. When the two parts bind together, graceful blue light is emitted from EBFP2. These constructs mimic the effect of dimerization in living cells, and are under fine control because they can be induced by the drug Dox. Dox concentration serves as the input to our system and navigates the quantitative properties involved in dimerization.

# Cellular Information Inference

Sense the environment, or die. The intricate world is full of chaotic information, knowledge of whom are vital for survival. But all that the cell “knows”is the concentrations of downstream products. It has to infer the right input and try to eliminate the uncertainty caused by noise.

For each level of input, the cell exhibits a specific probability output distribution. If the distributions are overlapped too seriously, the cell will have difficulty guessing the right answer. Thus the circuit is prone to error, its ability to accurately transmit information low.

Traditionally, we evaluate the impacts of noise using variance-related statistics, such as coefficient of variance. These quantities can only describe how concentrated the output is around the mean value, but cannot tell us how well we can infer one of the correlated random variables from the other. Channel capacity makes a better criteria of noise because it more scientifically depicts the information dissemination process.

# Experiment

Protocols:

Experiments:

We transfect HEK-293 human cells with our plasmid constructions as described in the form. Different concentrations of Dox are applied to cell culture at the same time.

Transfected cells are cultured for 48 hours before performing flow cytometry, long enough for protein expression level to achieve steady state. FACS examination measures florescent intensity emitted by each cell, from which we obtain a large sample of florescent protein expression level, tens of thousands of cells for each experiment group.

Data collected from flow cytometry are later analyzed on computers. We estimated probability density function (p.d.f.) from data using kernel density estimation, a nonparametric statistics method. Given high and low Dox concentration input, cells exhibit different probability distributions, as illustrated in the example below.

What we have in hand is the conditional distribution $p\left( {Y\left| {X = x} \right.} \right)$ , given a known level of input $x$ . In order to calculate mutual information $I\left( {X;Y} \right) = \iint {p\left( {x,y} \right){{\log }_2}\frac{{p\left( {x,y} \right)}}{{p\left( x \right)p\left( y \right)}}dxdy}$ and estimate channel capacity, which is $C = \sup I\left( {X;Y} \right)$ , we need to find the input distribution $p\left( X \right)$ and joint distribution $p\left( {X,Y} \right)$ that optimizes the equation. $p\left( X \right)$ , however, is not known in the first place. We first randomly pick a stochastic vector as the initial input distribution and then use an optimization algorithm to iterate the function and maximize $I\left( {X;Y} \right)$ . The final result is the channel capacity.

# Results

Do our circuits work?

Yes, they do sense the input level of Dox concentration. Figure. illustrates the changing distribution of EBFP2 florescent intensity in response to Dox gradient. With higher concentration, the distribution shifts to the right till reaching saturation. (TRE-EBFP2N:IntN and TRE-EBFP2C:IntC group is displayed as an example)

The shift, however, is only intuitive. We need more accurate methods to study the quantitative properties. To do this, we plot transfer functions of each group. Transfer function demonstrates the relation between the input level (Dox concentration) and the output level (amount of florescent protein). Plot the function, and the shape of the curve is highly informative.

The transfer functions of all seven groups are illustrated below. All values are in logarithm space. Note that for the convenience of plotting, the points where Dox=0 are plotted at Dox=0.01. (or the point will fly out far to negative infinity)

In the leftmost figure, EBFP2 without intein sequence show relatively low affinity and thus low expression level. Nevertheless, their leakage level is low as well, and Dox induction leads to approximately fold change. As for the middle and right figures, both split EBFP2 with intein and intact EBFP2 have about fold change when induced by Dox, but split EBFP2 have lower leakage level.

Meanwhile, if one half of EBFP is driven by constitutive promotor CMV, the leakage level remains the same but the induced multiple suffers. This is expected beforehand because with one constitutively-expressed part, the circuit can only sense the input with one half of the split proteins, thus becoming slightly less inducible.

Normalizing the curves lead to more interesting discoveries. Even though TRE-EBFP2N + CMV-EBFP2C leads to poor fold change, the transfer curve is significantly steeper when the dimerization process is reversible. This means better switch-like properties. With the presence of intein, the effect is weaker but still visible.

Normalize transfer curves to the range of 0 to 1, we can find that the shapes are different. Lines representing split proteins are later to rise and steeper.

If we normalize the initial EBFP2 level to 1, split EBFP2 with intein displays better properties than the other two settings. From fig. we can clearly see that it has the highest multiple among the three, even significantly higher than that of the intact EBFP2. The result shows that split proteins, with high binding affinity, can defeat original undivided proteins for their low leakage level and high induced multiples, that is, high sensibility to inputs.

How well do circuits perform as evaluated by channel capacity?

I Seven circuits are evaluated in our experiment. Calculated channel capacities are displayed in fig.

Come on, this not as dizzying as it is at the first glance. Let’s look at it step by step.

More information is transmitted when both parts of the split protein are inducible.

When both promotors are TREs, both split parts are inducible, and channel capacity is relatively higher than that of channels with un-inducible CMVs. In the absence of intein, the two peptides find it hard to dimerize, giving rise to low channel capacity.

Upon addition of intein sequence, the binding process becomes irreversible since the two halves assemble into one intact protein through splicing. As a consequence, channel capacity greatly increases. Double-inducible group with two TRE promotors still win the competition speaking of channel capacity.

Comparing three inducible groups leads to the conclusion, that splitting leads to decrease in channel capacity, but adding intein sequence to peptides rescue the effect, elevating the channel capacity to even higher level.

What can the result teach us?

Inspirations for Synthetic Biology Engineering:

For synthetic biologists, it is crucial and challenging to construct AND gate. Split up a regulatory protein such as transcription factor, express two halves independently, and an AND gate is born.

Nonetheless, the act of splitting up can bring about unexpected side effects. Gene regulatory circuits are highly dependent on quantitative properties, its complexity and nonlinearity contributing to hard-to-predict behaviors of biological systems. Once an important part in the system is chopped up, who knows what will happen next?

Our program quantitatively studies the behavior of such systems. Splitting up changes the circuit’s output-input function, alleviates leakage phenomenon, improving switch-like property, and increases fold change when induced by circuit inputs. Moreover, we use channel capacity from information theory to describe how well can they transmit signals. We find adding intein sequence tremendously beneficial in that it shifts the channel capacity to a higher level, thus ameliorating uncertainty.

When it comes to designing logic gates, our findings can lead the way. Not only can splitting achieve logic gate effect, but also can it improve sensibility to inputs and defend the system against detrimental interferences of noise when intein is added. Future work shall benefit from this fundamental investigation of basic synthetic biology blocks.

Highlighting the biological significance of dimerization:

Dimerization is only too common in cells. Monomers assembly into dimers for further functions all the time, some interactions strong, some interactions weak. Function-less newborn peptides piece together and get to work, forming so-called tertiary structure; activated kinases reach each other and mutually phosphorylate; transcription factors, when forming homo- or hetero-dimers according to different stoichiometry, leads to varied downstream responses and distinct cellular fates…

Yes, we know which proteins dimerize. We understand how proteins dimerize as well, by interaction of domains like leucine zippers and so forth. But why? What is the point of dimerization?

Previous researches have underlined the important advantages of dimerization, including differential regulation, specificity, facilitated proximity and so on. The influence of dimerization in noise propagation is hardly touched due to the difficulty in controlling experiment variables. Synthetic biology provides powerful tools to carry out experiments otherwise impossible in designed systems. This is exactly what we do.

Traditionally, we evaluate the impacts of noise using variance-related statistics, such as coefficient of variance. These quantities can only describe how concentrated the output is around the mean value, but cannot tell us how well we can infer one of the correlated random variables from the other. Channel capacity makes a better criteria of noise because it more scientifically depicts the information dissemination process.

# Reference

1. Jörn M. Schmiedel et al. MicroRNA control of protein expression noise. Science 348, 128 (2015); DOI: 10.1126/science.aaa1738

2. Christian M Metallo and Victor Sourjik. Environmental sensing, information transfer, and cellular decision-making. Current Opinion in Biotechnology 2014, 28:149–155

3. RRaymond Cheong et al. Information transduction capacity of noisy biochemical signaling networks. Science. 2011 October 21; 334(6054): 354–358. DOI:10.1126/science.1204553

4. Jangir Selimkhanov et al. Accurate information transmission through dynamic biochemical signaling networks. Science. 12 DECEMBER 2014. VOL: 346, ISSUE: 6215

5. Grigoris D. Amoutzias et al. Choose your partners: dimerization in eukaryotic transcription factors. Cell. 9 April 2008. DOI:10.1016/

6. Dacheng Ma et al. Integration and exchange of split dCas9 domains for transcriptional controls in mammalian cells. Nature communications. 7:13056 DOI: 10.1038

# Overview

We perform deterministic and stochastic mathematical modeling alongside wet lab experiments, and the calculated channel capacity values from models and experimental data fit well with each other. The ability to precisely transmit information is crucial for biological process, which is quantitatively evaluated by channel capacity, a novel concept in information theory. Structure differences of gene regulatory circuits give rise to different capacities and, in turn, different levels of information certainty and reliability. Specifically, although splitting the expressed protein into two halves that can assembly to form dimer provides more flexible functions and other regulatory benefits, it curtails the information capacity compared to expressing the undivided protein. Adding intein sequence to the two parts, however, compensated for the loss of capacity by making the association process irreversible. Simulation results tie in with data measured from synthetic gene regulatory circuits, showing the same conclusions. Furthermore, our models predict several conclusions under conditions we not yet have time to carry out, or even hard to achieve experimentally, and illuminates the way for future research and design.

# Gillespie Algorithm Simulation

Gillespie Algorithm, a Stochastic Simulation Algorithm (SSA), can simulate biochemical reactions with small number of molecules accurately. Reactions in cells are full of random fluctuations due to low number of molecules present. For instance, there can be only one or two copies of a gene in one cell. (see https://en.wikipedia.org/wiki/Gillespie_algorithm for more details)

An example simulation result of our script using Gillespie Algorithm. 100 cells are generated in this figure. We can see how the state of the cell (represented by protein level in this case) change with the flow of time.

We use this algorithm to generate hundreds of sample trajectories at high and low inputs, each representing the dynamic changes in the amount of protein in a single cell. Afterwards, we find a time point when all reactions in cells have reached steady state, and ink down the states of the cells at that very moment. The approach resembles the flow cytometry experiment, in both of which we get a sample composed of cells with dissimilar protein levels.

From the simulated data, we can use methods in experiment data processing to calculate channel capacities.

We use ${k_{on}}$ and ${k_{off}}$, two parameters of dimer forming and disassociation, to characterize the process of dimerization. For split EBFP2 without intein, kon is relatively small due to low affinity, and ${k_{off}} > 0$. For split EBFP2 with intein, kon is larger and ${k_{off}} = 0$ because once they interact, they will not fall apart.

Under suitable parameter set, we obtained the following results shown in the table below. When ${k_{off}}$ comes down to zero, channel capacity increases in accordance to experimental results.

We computationally scan the two parameters in two-dimensional parameter space to explore the impacts of those kinetic constants.

In the heat map, warmer color stands for higher channel capacity, vice versa. The figure on the left stands for the case where both halves are driven by inducible promotors, the right for that where one of them is driven by constitutive promotor. The latter resemble chemical titration, so we legend it as above.

The lower left part of the map is warmer. In this region, ${k_{on}}$ is high and ${k_{off}}$ is low, which means that two peptides are prone to associate. This is the very district where the state of split EBFP2 with intein lies in.

The higher right part, on the contrary, is chilling to the bones. Here, ${k_{on}}$ is low and ${k_{off}}$ is high, which means that the two parts tend to disassociate. The state of spilt EBFP2 without intein falls into this region.

If we compare the two figures in parallel, we can see that the values in the “titration” map is lower than those in the “dimer” map. This is in accordance with our experimental data. Wet lab work shows that with a constitutively-expressed part, channel capacity is lower. The model match and explain the phenomenon.

To sum up, mathematical modeling reveals the impacts of thermodynamic constants on the gene regulatory circuit’s ability to convey information.

# Deterministic Modeling

In order to achieve this, We use ordinary differential equations. ${x_1}$ to ${x_7}$ stands for ebfp2N, ebfp2C, EBFP2N mRNA, EBFP2N mRNA, EBFP2N, EBFP2C and EBFP2.

$\frac{{d{x_1}}}{{dt}} = {k_1}\left[ {dox} \right]\left[ {c - {x_1}} \right] - {k_{ - 1}}{x_1}$

$\frac{{d{x_2}}}{{dt}} = {k_1}\left[ {dox} \right]\left[ {c - {x_2}} \right] - {k_{ - 1}}{x_2}$

$\frac{{d{x_3}}}{{dt}} = {k_2}{x_1} - {d_2}{x_3}$

$\frac{{d{x_4}}}{{dt}} = {k_2}{x_2} - {d_2}{x_4}$

$\frac{{d{x_5}}}{{dt}} = {k_3}{x_3} - {d_3}{x_5} - {k_{on}}{x_5}{x_6} + {k_{off}}{x_7}$

$\frac{{d{x_6}}}{{dt}} = {k_3}{x_4} - {d_3}{x_6} - {k_{on}}{x_5}{x_6} + {k_{off}}{x_7}$

$\frac{{d{x_7}}}{{dt}} = {k_{on}}{x_5}{x_6} - {k_{off}}{x_7} - {d_4}{x_7}$

We plot the transfer function as in experiment data processing. Tune the parameters and the model is feasible.

# Outlook

Exactly，all of the experiments we have done is to improve the channel capacity and reduce the disturbance from the noise at the same time. Anyway our final target is to make sure that we can transfer more information when the environment is noisy. Our team focuses on comparing performance between two ways to get fluorescent protein (EBFP), namely, to express the protein directly from the gene, or to independently express two proteins and let them assembly afterwards to become the same protein as the one in the first way. Beside analyzing the experimental data by calculating channel capacity of two systems we wanted to know if there are any difference between two circuits in amplifying the inputs or responding to the noise. And we also want to know which frequency of noise makes our circuit worse and which signal-to-noise ratio may have a higher channel capacity?

So we have these simulations first and we will conduct more experiments to collect data and confirm our conclusion.

# Synthetic BiologyMeetsDaily Life

## Overview

Our project is about information flowing through gene-regulatory networks and it contains both theory of information and biology. The human practice we did fit our project very well. We show the students who have no idea about what synthetic biology is the fun it have and also make the cross subjects knew by more people. We all think we have did a meaningful practice and help promote development of the cross subjects which is also the aim of iGEM. In a word, human practice gives us a chance to tell more people about our idea and the makes more people realize the charm of synthetic biology.

## Eliminate Chinese’s prejudice about biotechnology

After hearing the community's concern about biotechnology, we held two lectures to clear their prejudice. Meanwhile, we printed 200 booklets to expand publicity.

## Background

In China, because of the UN human issue that scientists used 60 children to examine their new transgenic product, Golden rice last year, the question about the safety of transgenic technology remains indistinct. The masses even became scared of the transgenic technology as well as the total biotechnology. As a result, transgenic products were resisted all over China.

## Lectures

Although many biologists came out to prove their justice, many Chinese still have stereotypes of biotechnology. We hosted two lectures about the correctness of biotechnology for college freshmen and high school students. We introduced the current stereotype phenomenon, then presented the biological applications that have made great contributions, such as bio-fuel, Artemisinin (kind of drug for malaria, first extracted by Chinese pharmacist Youyou Tu) produced by yeast and microorganism which can curb environmental pollution. Because our project involve the knowledge of both biological process and the lecture also involved the development of Synthetic Biology, which is a course of cross-disciplines, requiring us to look at a living organism from the perspective of information system. These contents attracted the attention of students from College of Information.

## Brochures

To circulate synthetic biology knowledge and inspiration, we demonstrated key ideas in the discipline in a booklet and handed out the booklets to more high school students. Two students e-mailed us to express their interest in synthetic biology and consulted to us about how to be admitted by Tsinghua University.

# Communication & Collabroation

## With University of Science & Technology Beijing (USTB)

We cooperated with the iGEM team from University of Science & Technology Beijing (USTB) and we both learned a lot of thing from each other.

We often had friendly conversions through WeChat and told each other what our projects is about and how our projects is going. They told us that they had some trouble in assembling a plasmid they needed. They tried a lot of ways to assemble the plasmid but failed and they asked us for help. They sent us the photo of their unsuccessful plasmid and after some analyses we suggested them using a method called Gibson Assembly. They adopted our opinions and did some try again. Later, they told us that they had made it successfully and we were all glad to hear that.

We also invited them to our school and talked about our projects. We learned about that their project is about a kind of compound named notoginseng saponin. They told us about the process and purpose of their experiment in detail and we noticed that there are some cases they ignored. And also, we help them optimize the process of their experiment and gave them a hand on the analyzing of their result. At the same time, we invited them to our laboratory, showed them how our experiment goes on and also, gave them some advice on the operations they should take care of when they did their experiment. And they also gave us a lot of useful suggestions and all of us had a great and meaningful day!

## With ShanghaiTech University

Computing Resource Support for Modeling——Simulating our system with Gillespie Algorithm, we fell short of computing power when it comes to scanning key parameters. ShanghaithchChina generously helped us out by running our code on their servers, which significantly boosted efficiency.

Illustrating Science——We illustrated ShanghaitechChina's project with cartoon, visualizing the process. The image is used in [where it is used]. When art encounters science...

# Acknowledgement

Tsinghua Team help us dry our part for part submission with their machines.

Modeling

Human Practice

Wiki Building

Wet Lab

Human Practice

Wet Lab

Modeling

Wet Lab

Safety

Modeling

Wet Lab

×