Abstract
When information flows through gene-regulatory networks, noise is introduced, and fidelity suffers. A cell unable to correctly infer the environment signals from the noisy inputs may be hard to make right responses. But there are so many “parallel circuit” in gene-regulatory networks where independently transcribed monomers assemble into functional complexes for downstream regulation. As we know, when we talking with two person at the same time, we could not fully understand them. But if things they talked about were more similar, we could understand their words much better. So that would tighter connection of monomers benefit the quality of gene-regulatory networks?
Inspired by these thoughts, we construct synthetic biology circuit by using split florescent proteins. And we add inteins to split florescent proteins to the make the connection tighter. Then, we quantitatively measure the capacity of these information channels. Computation and wet lab work are combined to optimize our understanding of such systems, and to interpret potential biological significance of reoccurring parallel designs in nature.
Description
From information theory to our project
As we can learn from Wikipedia[1], information theory studies the quantification, storage, and communication of information which was originally proposed by Claude E. Shannon in 1948. The theory has developed amazingly and has found applications in many areas. It’s not exaggerated to say that we can see the power of information theory all the time.
Based on this, we would like to explore something originally in biological pathway using information theory.
Our project has a close connection with two terms in information theory, mutual Information and channel capacity. And what are they?
In probability theory and information theory, the mutual information (MI) of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the "amount of information" (in units such as bits) obtained about one random variable, through the other random variable. The concept of mutual information is intricately linked to that of entropy of a random variable, a fundamental notion in information theory, that defines the "amount of information" held in a random variable. You can understand it trough from the figure below.
Formally, the mutual information of two discrete random variables X and Y can be defined as:\[I\left( {X;Y} \right) = \sum\limits_{y \in Y} {\sum\limits_{x \in X} {p\left( {x,y} \right)\log \left( {\frac{{p\left( {x,y} \right)}}{{p\left( x \right)p\left( y \right)}}} \right)} } \]
And the formula can also be proofed:\[I\left( {X;Y} \right) = H\left( X \right) - H\left( {X\left| Y \right.} \right) = H\left( Y \right) - H\left( {Y\left| X \right.} \right)\]
From the formula you can easily think that mutual information is the reduction of uncertainty in $X$ when you know $Y$.
And when you understand the mutual information, you can easily understand what the channel capacity is.
In electrical engineering, computer science and information theory, channel capacity is the tight upper bound on the rate at which information can be reliably transmitted over a communications channel. And by the noisy-channel coding theorem, the channel capacity of a given channel is the limiting information rate (in units of information per unit time) that can be achieved with arbitrarily small error probability. And you can also see the figure below.
The channel capacity is defined as:\[C = \mathop {\sup }\limits_{{p_X}\left( x \right)} I\left( {X;Y} \right)\]
where the supremum is taken over all possible choices of ${p_X}\left( x \right)$.
After you know about all the concept above, now we are glad to tell you that you can easily follow us and find out many interesting and inspiring things in our project. Congratulations!
miRNA as Potential Biomarkers
Studies have confirmed that miRNA expression is highly concordant cross individuals. And aberrant expression of miRNA may relate to diseases. Some studies have reported that specific miRNAs in tissue and plasma can be discriminatory bio-makers for detecting cancers. However, getting access to tissue of plasma means physical harm to the human body. So we turn to the easily accessible saliva. Since saliva is considered to be a terminal product of blood circulation, components like proteins and RNAs which are present in plasma are also present in saliva. In fact, both coding RNAs and non-coding RNAs, including some miRNAs, have been found in human saliva. Although mRNAs are highly degraded in saliva, miRNAs are stably and abundantly present in saliva. Recently, there are many reports on cancer-related miRNA expression in saliva. Featured miRNA expression are reported to be found in oral squamous cell carcinoma, parotid gland tumors and esophageal cancer, indicating potential salivary miRNA to be biomarkers for detecting these diseases. According to the test done by Zijun Xie group, there are three type of miRNAs significantly upregulated in the whole saliva from the esophageal cancer patient group in contract to normal control group – miR-10b, miR-144, and miR-451 (p value 0.001, 0.012 and 0.002, respectively; AUC 0.762, 0.706 and 0.756, respectively). And four miRNAs are significantly upregulated in saliva supernatants from the esophageal cancer patient group – miR-10b, miR-144, miR-21 and miR-451. Among them, miR-21 is the most frequently reported one for its high performance in specific expression level related to esophageal cancer (according to one of the reports, p < 0.05, AUC = 0.8820, sensitivity = 90.20% and specificity 70.69%; different tests may report different results).
Comprehensively considering the performance of each miRNA, we finally chose miR-144 as our biomarkers for esophageal cancer. To make our detection quick and convenient, we designed synthetic gene pathways based on paper. It will only require some saliva to complete the detection, which will do no harm to human body. And this techniques can be expanded to be used in the detection of many other diseases which has specific miRNA expression pattern in saliva.
To make our detection quick and convenient, we designed synthetic gene pathways. It will only require some saliva to complete the detection, which will do no harm to human body. And this technique can be expanded to be used in the detection of many other diseases which has specific miRNA expression pattern in saliva.
What is Toehold Switch
We choose toehold switch as our miRNA detector. The structure of toehold switch is similar to hairpin, except it has a loop at the top as ‘toehold’. Toehold switch functions as riboregulator through linear-linear interaction between RNAs. When target RNA appears, it will bind one of the toehold switch stems and open the loop, exposing the RBS.
Toehold switch systems are composed of two RNA strands referred to as the switch and trigger. The switch RNA contains the coding sequence of the gene being regulated. Upstream of this coding sequence is a hairpin-based processing module containing both a strong RBS and a start codon that is followed by a common 21 nt linker sequence coding for low-molecular-weight amino acids added to the N terminus of the gene of interest. A single-stranded toehold sequence at the 50 end of the hairpin module provides the initial binding site for the trigger RNA strand. This trigger molecule contains an extended single-stranded region that completes a branch migration process with the hairpin to expose the RBS and start codon, thereby initiating translation of the gene of interest.
Our Circuit to Detect miRNA-144
There are 3 parts in our circuit in total.
Part 1 includes toehold switch for miRNA-144 and GFP coding sequence. When miRNA-144 exists, the switch is on and mRNA for GFP is transcribed.
Part 2 contains toehold switch for GFP mRNA and T3 RNA polymerase coding sequence (BBa_K346000). Note that, since the maximum length of trigger RNA for toehold switch is about 25nt, so we analyzed GFP mRNA’s structure and choose a small piece from it which ensures binding specificity and stability.
Part 3 is simply a GFP generator (BBa_E0840), with T3 promoter. As we know, T3 promoter can only function when bound with T3 RNA polymerase.
So now it’s clear that, part 2 and part 3 are designed for amplification. They form a positive feedback loop. So the whole process is as follow: when miRNA-144 exists, it triggers part 1 and GFP mRNAs are transcribed. These mRNA on the one hand can be directly translated to GFP and show green fluorescence, on the other hand, can trigger part 2 and T3 RNA polymerase is transcribed and translated, which enables part 3 to work, thus transcribe more GFP mRNAs. And these GFP mRNAs can also be used to active more part 2.
Experiment & Results
Experiment
Protocols:
Experiments:
We transfect HEK-293 human cells with our plasmid constructions as described in the form [ref: table]. Different concentrations of Dox are applied to cell culture at the same time.
Transfected cells are cultured for 48 hours before performing flow cytometry, long enough for protein expression level to achieve steady state. FACS examination measures florescent intensity emitted by each cell, from which we obtain a large sample of florescent protein expression level, tens of thousands of cells for each experiment group.
Data collected from flow cytometry are later analyzed on computers. We estimated probability density function (p.d.f.) from data using kernel density estimation, a nonparametric statistics method. Given high and low Dox concentration input, cells exhibit different probability distributions, as illustrated in the example below [ref: fig].
What we have in hand is the conditional distribution $p\left( {Y\left| {X = x} \right.} \right)$ , given a known level of input $x$ . In order to calculate mutual information $I\left( {X;Y} \right) = \iint {p\left( {x,y} \right){{\log }_2}\frac{{p\left( {x,y} \right)}}{{p\left( x \right)p\left( y \right)}}dxdy}$ and estimate channel capacity, which is $C = \sup I\left( {X;Y} \right)$ , we need to find the input distribution $p\left( X \right)$ and joint distribution $p\left( {X,Y} \right)$ that optimizes the equation. $p\left( X \right)$ , however, is not known in the first place. We first randomly pick a stochastic vector as the initial input distribution and then use an optimization algorithm to iterate the function and maximize $I\left( {X;Y} \right)$ . The final result is the channel capacity.
Results
Do our circuits work?
Yes, they do sense the input level of Dox concentration. Figure. illustrates the changing distribution of EBFP2 florescent intensity in response to Dox gradient. With higher concentration, the distribution shifts to the right till reaching saturation. (TRE-EBFP2N:IntN and TRE-EBFP2C:IntC group is displayed as an example)
The shift, however, is only intuitive. We need more accurate methods to study the quantitative properties. To do this, we plot transfer functions of each group. Transfer function demonstrates the relation between the input level (Dox concentration) and the output level (amount of florescent protein). Plot the function, and the shape of the curve is highly informative.
The transfer functions of all seven groups are illustrated below. All values are in logarithm space. Note that for the convenience of plotting, the points where Dox=0 are plotted at Dox=0.01. (or the point will fly out far to negative infinity)
In the leftmost figure, EBFP2 without intein sequence show relatively low affinity and thus low expression level. Nevertheless, their leakage level is low as well, and Dox induction leads to approximately fold change. As for the middle and right figures, both split EBFP2 with intein and intact EBFP2 have about fold change when induced by Dox, but split EBFP2 have lower leakage level.
Meanwhile, if one half of EBFP is driven by constitutive promotor CMV, the leakage level remains the same but the induced multiple suffers. This is expected beforehand because with one constitutively-expressed part, the circuit can only sense the input with one half of the split proteins, thus becoming slightly less inducible.
Normalizing the curves lead to more interesting discoveries. Even though TRE-EBFP2N + CMV-EBFP2C leads to poor fold change, the transfer curve is significantly steeper when the dimerization process is reversible. This means better switch-like properties. With the presence of intein, the effect is weaker but still visible.
Normalize transfer curves to the range of 0 to 1, we can find that the shapes are different. Lines representing split proteins are later to rise and steeper.
If we normalize the initial EBFP2 level to 1, split EBFP2 with intein displays better properties than the other two settings. From fig. we can clearly see that it has the highest multiple among the three, even significantly higher than that of the intact EBFP2. The result shows that split proteins, with high binding affinity, can defeat original undivided proteins for their low leakage level and high induced multiples, that is, high sensibility to inputs.
How well do circuits perform as evaluated by channel capacity?
I Seven circuits are evaluated in our experiment. Calculated channel capacities are displayed in fig.
Come on, this not as dizzying as it is at the first glance. Let’s look at it step by step.
More information is transmitted when both parts of the split protein are inducible.
When both promotors are TREs, both split parts are inducible, and channel capacity is relatively higher than that of channels with un-inducible CMVs. In the absence of intein, the two peptides find it hard to dimerize, giving rise to low channel capacity.
Upon addition of intein sequence, the binding process becomes irreversible since the two halves assemble into one intact protein through splicing. As a consequence, channel capacity greatly increases. Double-inducible group with two TRE promotors still win the competition speaking of channel capacity.
Comparing three inducible groups leads to the conclusion, that splitting leads to decrease in channel capacity, but adding intein sequence to peptides rescue the effect, elevating the channel capacity to even higher level.
What can the result teach us?
Inspirations for Synthetic Biology Engineering:
For synthetic biologists, it is crucial and challenging to construct AND gate. Split up a regulatory protein such as transcription factor, express two halves independently, and an AND gate is born.
Nonetheless, the act of splitting up can bring about unexpected side effects. Gene regulatory circuits are highly dependent on quantitative properties, its complexity and nonlinearity contributing to hard-to-predict behaviors of biological systems. Once an important part in the system is chopped up, who knows what will happen next?
Our program quantitatively studies the behavior of such systems. Splitting up changes the circuit’s output-input function, alleviates leakage phenomenon, improving switch-like property, and increases fold change when induced by circuit inputs. Moreover, we use channel capacity from information theory to describe how well can they transmit signals. We find adding intein sequence tremendously beneficial in that it shifts the channel capacity to a higher level, thus ameliorating uncertainty.
When it comes to designing logic gates, our findings can lead the way. Not only can splitting achieve logic gate effect, but also can it improve sensibility to inputs and defend the system against detrimental interferences of noise when intein is added. Future work shall benefit from this fundamental investigation of basic synthetic biology blocks.
Highlighting the biological significance of dimerization:
Dimerization is only too common in cells. Monomers assembly into dimers for further functions all the time, some interactions strong, some interactions weak. Function-less newborn peptides piece together and get to work, forming so-called tertiary structure; activated kinases reach each other and mutually phosphorylate; transcription factors, when forming homo- or hetero-dimers according to different stoichiometry, leads to varied downstream responses and distinct cellular fates…
Yes, we know which proteins dimerize. We understand how proteins dimerize as well, by interaction of domains like leucine zippers and so forth. But why? What is the point of dimerization?
Previous researches have underlined the important advantages of dimerization, including differential regulation, specificity, facilitated proximity and so on. [citation needed] The influence of dimerization in noise propagation is hardly touched due to the difficulty in controlling experiment variables. Synthetic biology provides powerful tools to carry out experiments otherwise impossible in designed systems. This is exactly what we do.
Traditionally, we evaluate the impacts of noise using variance-related statistics, such as coefficient of variance. These quantities can only describe how concentrated the output is around the mean value, but cannot tell us how well we can infer one of the correlated random variables from the other. Channel capacity makes a better criteria of noise because it more scientifically depicts the information dissemination process.
Reference
1. Jörn M. Schmiedel et al. MicroRNA control of protein expression noise. Science 348, 128 (2015); DOI: 10.1126/science.aaa1738
2. Christian M Metallo and Victor Sourjik. Environmental sensing, information transfer, and cellular decision-making. Current Opinion in Biotechnology 2014, 28:149–155
3. Rouillard J M, Lee W, Truan G, et al. Gene2Oligo: oligonucleotide design for in vitro gene synthesis[J]. Nucleic acids research, 2004, 32(suppl 2): W176-W180.
4. Shimizu Y, Inoue A, Tomari Y, et al. Cell-free translation reconstituted with purified components[J]. Nature biotechnology, 2001, 19(8): 751-755.
5. Gibson D G, Young L, Chuang R Y, et al. Enzymatic assembly of DNA molecules up to several hundred kilobases[J]. Nature methods, 2009, 6(5): 343-345.
6. http://www.snapgene.com/resources/plasmid_files/your_time_is_valuable/
7. http://www.addgene.org/plasmid-protocols/gibson-assembly/
8. http://www.snapgene.com/resources/gibson_assembly/
9. Lin X, Lo H C, Wong D T, et al. Noncoding RNAs in Human Saliva as Potential Disease Biomarkers[J]. Frontiers in Genetics, 2015, 6.
10. Zijun, Xie, Gang, Chen, Xuchao, Zhang, et al. Salivary MicroRNAs as Promising Biomarkers for Detection of Esophageal Cancer[J]. Plos One, 2013, 8(4):e57502.
11. Minhua Y E, Penghui Y E, Zhang W, et al. [Diagnostic values of salivary versus and plasma microRNA-21 for early esophageal cancer].[J]. Journal of Southern Medical University, 2014, 34(6):885-889.
12. http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi
13. http://www.nupack.org
14. http://helixweb.nih.gov/dnaworks
15. http://berry.engin.umich.edu/gene2oligo
Acknowledgement
Tsinghua Team help us dry our part for part submission with their machines.