Abstract
When information flows through gene-regulatory networks, noise is introduced, and fidelity suffers. A cell unable to correctly infer the environment signals from the noisy inputs may be hard to make right responses. What factors can possibly alter the circuit's ability to sense extrinsic information?
There are many “parallel circuits” in gene-regulatory networks where independently transcribed monomers assemble into functional complexes for downstream regulation. We wonder if those frequent dimerization designs are fundamental in improving the circuit properties. In addition, would higher affinity between monomers boost the quality as well?
Inspired by these thoughts, we construct synthetic biology circuit by using split florescent proteins. And we add inteins to split florescent proteins to the make the binding tighter. Then, we quantitatively measure the capacity of these information channels aided by information theory. Computation and wet lab work are combined to optimize our understanding of such systems, and to interpret potential biological significance of reoccurring parallel designs in nature.
introductory Story
Let us tell a simple but interesting story to help you understand what our project is about and also relax yourselves before you look into our project.I believe all of you know about a term called binaural effect which means you can judge where a thing is with less deviations when you use both of your ears. Similarly, here is the story.
Li lei is an undergraduate from Tsinghua University and Han Meimei is an undergraduate from Peking University. They met and knew each other when they attended the Jamboree of iGEM. After that they always communicated through WeChat and fell in love with each other. So Li lei often called Han Meimei to make sure that how she is going and wanted to make her feel happy. But he had a problem. The telephone line was very noisy and Han Meimei could not hear what he said clearly every time he called her. He knew that and he was very depressed because he worried about that Han Meimei would be angry with him. One day he came up with a good idea and he bought two. Since then he bought two amplifiers and used them both when he called her. Now Han Meimei can hear him clearly and this tells us that when you use two paths other than one to transmit your messages, the noises will be reduced observably. And now with the inspiration, come and see our project.
Description
From information theory to our project
As we can learn from Wikipedia, information theory studies the quantification, storage, and communication of information which was originally proposed by Claude E. Shannon in 1948. The theory has developed amazingly and has found applications in many areas. It’s not exaggerated to say that we can see the power of information theory all the time.
Based on this, we would like to explore something originally in biological pathway using information theory.
Our project has a close connection with two terms in information theory, mutual Information and channel capacity. And what are they?
In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" (in units such as bits) obtained about one random variable, through the other random variable. The concept of mutual information is intricately linked to that of entropy of a random variable, a fundamental notion in information theory, that defines the "amount of information" held in a random variable. You can understand it trough from the figure below.
Formally, the mutual information of two discrete random variables X and Y can be defined as:\[I\left( {X;Y} \right) = \sum\limits_{y \in Y} {\sum\limits_{x \in X} {p\left( {x,y} \right)\log \left( {\frac{{p\left( {x,y} \right)}}{{p\left( x \right)p\left( y \right)}}} \right)} } \]
And the formula can also be proofed:\[I\left( {X;Y} \right) = H\left( X \right) - H\left( {X\left| Y \right.} \right) = H\left( Y \right) - H\left( {Y\left| X \right.} \right)\]
From the formula you can easily think that mutual information is the reduction of uncertainty in $X$ when you know $Y$.
And when you understand the mutual information, you can easily understand what the channel capacity is.
In electrical engineering, computer science and information theory, channel capacity is the tight upper bound on the rate at which information can be reliably transmitted over a communications channel. And by the noisy-channel coding theorem, the channel capacity of a given channel is the limiting information rate (in units of information per unit time) that can be achieved with arbitrarily small error probability. And you can also see the figure below.
The channel capacity is defined as:\[C = \mathop {\sup }\limits_{{p_X}\left( x \right)} I\left( {X;Y} \right)\]
where the supremum is taken over all possible choices of ${p_X}\left( x \right)$.
After you know about all the concept above, now we are glad to tell you that you can easily follow us and find out many interesting and inspiring things in our project. Congratulations!
Protein Dimerization and Splitting Up
Dimerization is only too common in cells. Monomers assembly into dimers for further functions all the time, some interactions strong, some interactions weak. Function-less newborn peptides piece together and get to work, forming so-called tertiary structure; activated kinases reach each other and mutually phosphorylate; transcription factors, when forming homo- or hetero-dimers according to different stoichiometry, leads to varied downstream responses and distinct cellular fates…
Previous researches have underlined the important advantages of dimerization, including differential regulation, specificity, facilitated proximity and so on. Wait, this cannot be the end of the list. What is the influence of dimerization in noise propagation? The question is hardly touched due to the difficulty in controlling experiment variables. Synthetic biology provides powerful tools to carry out experiments otherwise impossible in designed systems.
For synthetic biologists, it is crucial and challenging to construct AND gate. Split up a regulatory protein such as transcription factor, express two halves independently, and an AND gate is born.
Nonetheless, the act of splitting up can bring about unexpected side effects. Gene regulatory circuits are highly dependent on quantitative properties, its complexity and nonlinearity contributing to hard-to-predict behaviors of biological systems. Once an important part in the system is chopped up, who knows what will happen next?
These are the thrilling challenges and opportunities we face in our program. We split up florescent protein and tune the affinity between them by adding intein sequence. When the two parts bind together, graceful blue light is emitted from EBFP2. These constructs mimic the effect of dimerization in living cells, and are under fine control because they can be induced by the drug Dox. Dox concentration serves as the input to our system and navigates the quantitative properties involved in dimerization.
Cellular Information Inference
Sense the environment, or die. The intricate world is full of chaotic information, knowledge of whom are vital for survival. But all that the cell “knows”is the concentrations of downstream products. It has to infer the right input and try to eliminate the uncertainty caused by noise.
For each level of input, the cell exhibits a specific probability output distribution. If the distributions are overlapped too seriously, the cell will have difficulty guessing the right answer. Thus the circuit is prone to error, its ability to accurately transmit information low.
Traditionally, we evaluate the impacts of noise using variance-related statistics, such as coefficient of variance. These quantities can only describe how concentrated the output is around the mean value, but cannot tell us how well we can infer one of the correlated random variables from the other. Channel capacity makes a better criteria of noise because it more scientifically depicts the information dissemination process.
Experiment & Results
Experiment
Protocols:
Experiments:
We transfect HEK-293 human cells with our plasmid constructions as described in the form. Different concentrations of Dox are applied to cell culture at the same time.
Transfected cells are cultured for 48 hours before performing flow cytometry, long enough for protein expression level to achieve steady state. FACS examination measures florescent intensity emitted by each cell, from which we obtain a large sample of florescent protein expression level, tens of thousands of cells for each experiment group.
Data collected from flow cytometry are later analyzed on computers. We estimated probability density function (p.d.f.) from data using kernel density estimation, a nonparametric statistics method. Given high and low Dox concentration input, cells exhibit different probability distributions, as illustrated in the example below.
What we have in hand is the conditional distribution $p\left( {Y\left| {X = x} \right.} \right)$ , given a known level of input $x$ . In order to calculate mutual information $I\left( {X;Y} \right) = \iint {p\left( {x,y} \right){{\log }_2}\frac{{p\left( {x,y} \right)}}{{p\left( x \right)p\left( y \right)}}dxdy}$ and estimate channel capacity, which is $C = \sup I\left( {X;Y} \right)$ , we need to find the input distribution $p\left( X \right)$ and joint distribution $p\left( {X,Y} \right)$ that optimizes the equation. $p\left( X \right)$ , however, is not known in the first place. We first randomly pick a stochastic vector as the initial input distribution and then use an optimization algorithm to iterate the function and maximize $I\left( {X;Y} \right)$ . The final result is the channel capacity.
Results
Do our circuits work?
Yes, they do sense the input level of Dox concentration. Figure. illustrates the changing distribution of EBFP2 florescent intensity in response to Dox gradient. With higher concentration, the distribution shifts to the right till reaching saturation. (TRE-EBFP2N:IntN and TRE-EBFP2C:IntC group is displayed as an example)
The shift, however, is only intuitive. We need more accurate methods to study the quantitative properties. To do this, we plot transfer functions of each group. Transfer function demonstrates the relation between the input level (Dox concentration) and the output level (amount of florescent protein). Plot the function, and the shape of the curve is highly informative.
The transfer functions of all seven groups are illustrated below. All values are in logarithm space. Note that for the convenience of plotting, the points where Dox=0 are plotted at Dox=0.01. (or the point will fly out far to negative infinity)
In the leftmost figure, EBFP2 without intein sequence show relatively low affinity and thus low expression level. Nevertheless, their leakage level is low as well, and Dox induction leads to approximately fold change. As for the middle and right figures, both split EBFP2 with intein and intact EBFP2 have about fold change when induced by Dox, but split EBFP2 have lower leakage level.
Meanwhile, if one half of EBFP is driven by constitutive promotor CMV, the leakage level remains the same but the induced multiple suffers. This is expected beforehand because with one constitutively-expressed part, the circuit can only sense the input with one half of the split proteins, thus becoming slightly less inducible.
Normalizing the curves lead to more interesting discoveries. Even though TRE-EBFP2N + CMV-EBFP2C leads to poor fold change, the transfer curve is significantly steeper when the dimerization process is reversible. This means better switch-like properties. With the presence of intein, the effect is weaker but still visible.
Normalize transfer curves to the range of 0 to 1, we can find that the shapes are different. Lines representing split proteins are later to rise and steeper.
If we normalize the initial EBFP2 level to 1, split EBFP2 with intein displays better properties than the other two settings. From fig. we can clearly see that it has the highest multiple among the three, even significantly higher than that of the intact EBFP2. The result shows that split proteins, with high binding affinity, can defeat original undivided proteins for their low leakage level and high induced multiples, that is, high sensibility to inputs.
How well do circuits perform as evaluated by channel capacity?
I Seven circuits are evaluated in our experiment. Calculated channel capacities are displayed in fig.
Come on, this not as dizzying as it is at the first glance. Let’s look at it step by step.
More information is transmitted when both parts of the split protein are inducible.
When both promotors are TREs, both split parts are inducible, and channel capacity is relatively higher than that of channels with un-inducible CMVs. In the absence of intein, the two peptides find it hard to dimerize, giving rise to low channel capacity.
Upon addition of intein sequence, the binding process becomes irreversible since the two halves assemble into one intact protein through splicing. As a consequence, channel capacity greatly increases. Double-inducible group with two TRE promotors still win the competition speaking of channel capacity.
Comparing three inducible groups leads to the conclusion, that splitting leads to decrease in channel capacity, but adding intein sequence to peptides rescue the effect, elevating the channel capacity to even higher level.
What can the result teach us?
Inspirations for Synthetic Biology Engineering:
For synthetic biologists, it is crucial and challenging to construct AND gate. Split up a regulatory protein such as transcription factor, express two halves independently, and an AND gate is born.
Nonetheless, the act of splitting up can bring about unexpected side effects. Gene regulatory circuits are highly dependent on quantitative properties, its complexity and nonlinearity contributing to hard-to-predict behaviors of biological systems. Once an important part in the system is chopped up, who knows what will happen next?
Our program quantitatively studies the behavior of such systems. Splitting up changes the circuit’s output-input function, alleviates leakage phenomenon, improving switch-like property, and increases fold change when induced by circuit inputs. Moreover, we use channel capacity from information theory to describe how well can they transmit signals. We find adding intein sequence tremendously beneficial in that it shifts the channel capacity to a higher level, thus ameliorating uncertainty.
When it comes to designing logic gates, our findings can lead the way. Not only can splitting achieve logic gate effect, but also can it improve sensibility to inputs and defend the system against detrimental interferences of noise when intein is added. Future work shall benefit from this fundamental investigation of basic synthetic biology blocks.
Highlighting the biological significance of dimerization:
Dimerization is only too common in cells. Monomers assembly into dimers for further functions all the time, some interactions strong, some interactions weak. Function-less newborn peptides piece together and get to work, forming so-called tertiary structure; activated kinases reach each other and mutually phosphorylate; transcription factors, when forming homo- or hetero-dimers according to different stoichiometry, leads to varied downstream responses and distinct cellular fates…
Yes, we know which proteins dimerize. We understand how proteins dimerize as well, by interaction of domains like leucine zippers and so forth. But why? What is the point of dimerization?
Previous researches have underlined the important advantages of dimerization, including differential regulation, specificity, facilitated proximity and so on. The influence of dimerization in noise propagation is hardly touched due to the difficulty in controlling experiment variables. Synthetic biology provides powerful tools to carry out experiments otherwise impossible in designed systems. This is exactly what we do.
Traditionally, we evaluate the impacts of noise using variance-related statistics, such as coefficient of variance. These quantities can only describe how concentrated the output is around the mean value, but cannot tell us how well we can infer one of the correlated random variables from the other. Channel capacity makes a better criteria of noise because it more scientifically depicts the information dissemination process.
Reference
1. Jörn M. Schmiedel et al. MicroRNA control of protein expression noise. Science 348, 128 (2015); DOI: 10.1126/science.aaa1738
2. Christian M Metallo and Victor Sourjik. Environmental sensing, information transfer, and cellular decision-making. Current Opinion in Biotechnology 2014, 28:149–155
3. RRaymond Cheong et al. Information transduction capacity of noisy biochemical signaling networks. Science. 2011 October 21; 334(6054): 354–358. DOI:10.1126/science.1204553
4. Jangir Selimkhanov et al. Accurate information transmission through dynamic biochemical signaling networks. Science. 12 DECEMBER 2014. VOL: 346, ISSUE: 6215
5. Grigoris D. Amoutzias et al. Choose your partners: dimerization in eukaryotic transcription factors. Cell. 9 April 2008. DOI:10.1016/
6. Dacheng Ma et al. Integration and exchange of split dCas9 domains for transcriptional controls in mammalian cells. Nature communications. 7:13056 DOI: 10.1038