Introducing E.co-Factory project
Scientists all over the world use Escherichia coli as the host for recombinant protein production. The system is so popular because of its rapidity, low cost and well-characterized genetics. The choice of a proper promoter is a factor highly affecting protein expression. Right now, pET lactose/IPTG induced T7 RNA polymerase dependent system, which enables high efficiency of protein production, is the most popular one. However, it has some downsides: the system is leaky, so proteins toxic for the host cell cannot be produced. Its leakiness is very hard to control once the stable and highly active T7 RNA polymerase is produced. Once the system is induced, the high heterologous protein accumulation can turn into a problem with insoluble inclusion bodies. The typical phage T7 RNA polymerase tends to introduce more mistakes than its cellular counterpart. Considering that translation in prokaryotic cells occurs on just transcribed mRNAs, slower cellular RNA polymerase can be beneficial for an accurate translation of mRNAs with stronger secondary structures, which have lower chance to form before ribosomes will cover the nascent mRNA molecule.
Our group, UAM_Poznan, aims to generate four sets of promoters induced by sugars: xylose, melibiose, arabinose and rhamnose, and give the user the possibility of introducing numerous genetic circuits into one cell, which expression can be independently controlled.
We have reduced promoters' size – they are minimal but fully functional and tightly regulated. We have also introduced changes in 5’UTRs (untranslated regions), as 5’UTR regions with their secondary structures and 16S rRNA binding sites of different sequence and location regarding AUG codon are known to strongly influence the efficiency of translation.
Our latest activity focuses on trials to find out to what extent the structure of an open reading frame (ORF) can be optimized and what is the difference in translation rate of let say optimal, less optimal, and much less optimal reading frame. We build our experimental model on sfGFP and mRFP – two soluble fluorescent proteins. We compare expression rate of contrasting variants of their coding sequences – optimized and the most deoptimized codon usage, codon adaptation usage, codon contexts and the highest or the lowest GC content. This simple comparison should allow to evaluate which method of ORF optimization does really make sense.
Strong, tight promoters
Our multipromoter expression system for E.coli contains four sets of pSB1C3 vectors with promoters induced by non-toxic compounds - sugars: arabinose (Ara promoters), rhamnose (Rha promoters), xylose (Xyl promoters) and melibiose (Mel promoters). Promoter is induced by a specific sugar and starts the expression of a gene. One can exchange the ORF of sfGFP to any other coding sequence of interest.. AraC, RhaS, RhaR, MelR, XylR are transcription factors- they bind specific sugars, by which they are activated, and start the transcription. These factors are dependent on the binding of the catabolite receptor protein (CRP)-cAMP complex to the promoter. In the presence of glucose, cAMP levels are low and CRP is unable to bind to the promoter, which results in blocked transcription.
Our multipromoter system provides differential levels of protein expression, so one can choose either stronger promoter, or the one ensuring lower levels of inducible expression. Since all four sets of promoters are tightly regulated, they can be useful in production of proteins toxic for E.coli cells. It is also worth taking under consideration that E.coli cells behave differently on different carbon sources, which results in contrasting bacterial growth rate and indirectly influence protein production. E.co-Factory ensures independent induction of expression of at least two different genes and efficient blockage of all promoters by glucose. Moreover, cellular RNA polymerase is more accurate and introduces less mistakes in the transcribed sequence than that of phage origin from pET system, which is important if one’s aim is to produce recombinant proteins for crystallography and medicine.
Our group modified arabinose, rhamnose, xylose and melibiose promoters, from which AraC-pBAD arabinose promoter is commercially available. All promoter sequences were copied from E.coli K-12 genome. Modifications of promoters include sequence size reduction with full functionality maintenance, because shorter promoters = smaller biobricks, which are more convenient for molecular cloning purposes. We have substituted original 5’UTRs of genes located downstream these promoters with synthetic sequences considering RNA structure, the RBS motif and its positioning. Our biggest achievement this year is the experiment in which we have transplanted a 39nt fragment of 5'UTR of gene 10 of bacteriophage T7 to three of four (melibiose inducible promoter is still under construction) tightly regulated E. coli promoters. It appears that all fusion promoters (araE1, xylE1, rhaE1) are still much tighter than the lactose induced T7 systems and all fusion promoters are much more responsive to inducers. One of them upon induction seems to give higher expression levels than T7 expression systems. Considering that the additional ribosome binding site in 5'UTR seems to be an universal mechanism which allows for an efficient polysome takeover in E. coli and possibly in other procaryotic systems, it opens new attractive opportunities, at least to develop very efficient expression systems for other bacteria like Bacillus, Lactococcus, and many other.
Better, and better ORFs
Degeneracy of the genetic code means that some amino acids are encoded by more than one codon, thus the same protein sequence can be translated from various mRNAs. The frequency of particular codons in the gene of interest can cause expression problems because of rarely occurring tRNAs. Variation in codon usage is considered to be a major problem in heterologous protein production. It can contribute to slowing down the translational process and induce error introduction. It is believed that by codon optimization one can substantially increase the gene expression and that the optimized gene will more effectively compete for cell resources and will be more accurately translated [Kane JK, 1995]. We would like to check which approach to optimize a reading frame is the best and to what extent it can improve the expression of the optimized gene. We consider improvements of such traits as: codon usage, codon adaptation index, contexts of codons and secondary structures in coding sequences.
We intentionally started our comparisons from implementing general optimization rules, which effects can be easily compared in simple induced expression experiments. We decided to check what is the difference in the protein synthesis rate between two very different versions of the same reading frame- sfGFP [Pedelacq JD, 2006], one composed of the most common codons (sfGFP-B, the best choice) in E. coli orfeome and another codon-monotonous as well but composed of the rarest ones (sfGFP-W, the worst case). According to bioinformatics analysis we indicated 7 codons that occur in E.coli genome with the least frequency rate. They encode Arginine (AGG), Leucine (CTA), Isoleucine (ATA), Glycine (GGA), Serine (TCA), Proline (CCC) and Threonine (ACA). These codons represent 40% of all codons of sfGFP_Worst ORF. sfGFP_B and sfGFP_W constructs differ within all codons with the exception of the first ten residues that correspond to a stable N-end as a 6-histidine tag, which is often used for affinity purification of recombinant proteins.
Looking for any other general way to optimize ORFs, we are working now on contrasting ORFs, which are AT or GC rich, with codons optimization based on codon adaptation index. Further experiments concern examining codon context effect, a hypothesis claiming that the translational efficiency of a given codon depends highly on its neighboring codons. Therefore, using specially designed software designed by our advisor, Melania. She analyzed and chose codon pairs which occur most often and least often in E.coli transcriptome. We have prepared four different codon context rankings: one based on whole orfeome and three different sets of coding sequences for high abundant proteins. Using the rankings we have maximized and minimized codon context for two ORFs, sfGFP and mRFP to test if there is a correlation between single codon, codon pair frequencies and protein abundancy.
According to the criteria that each group has to fulfill in order to get gold medal, we improved function of two biobricks, which we delivered to the iGEM Registry in 2015: one arabinose induced promoter- Arashort1 (araBAD; BBa_K1741000), and one xylose induced promoter- XylS (xylA-proD5'UTR; BBa_K1741009).
These year’s improved constructs are as follows:
BBa_K2014003 - pBAD-M5'UTR->sfGFP – improvement of pBAD promoter (BBa_K1741000) with M5’UTR,
BBa_K2014004 - pxylS-M5'UTR->sfGFP - improvement of XylS promoter (BBa_K1741009) with M5’UTR.
All constructs contain sfGFP under different promoters/UTRs- as a marker of gene expression and protein synthesis/accumulation.
We have also tested and slightly improved Ba_K1481003 - biobrick provided to iGEM in 2014 by Poznan_Bioinf team, named "sfGFP with R4-tag under arabinose promoter". We have checked if four rare arginine codons (namely AGG and AGA) added to the 3' end of GFP reading frame will influence its translation. Surprisingly we have found that there is no obvious difference in expression comparing to gene without arginine tag.
Thus, we improved the construct and added four rare arginine codons (AGA) under identical promoter (AraC-pBAD) with an identical 5’UTR- BBa_K2014007 .
1. Kane JF. (1995) Effects of rare codon clusters on high-level expression of heterologous proteins in Escherichia coli. Current Opinion in Biotechnology, 6:494-500
2. Pédelacq JD, Cabantous S, Tran T, Terwilliger TC, Waldo GS. (2006) Engineering and characterization of a superfolder green fluorescent protein. Nat Biotechnol. 2006 Sep;24(9):1170.
3. Olins PO, Rangwala SH.; A novel sequence element derived from bacteriophage T7 mRNA acts as an enhancer of translation of the lacZ gene in Escherichia coli. J Biol Chem. 1989 Oct 15;264(29):16973-6.
4. Davis J.H., Rubin A.J., Sauer R.T.; Design, construction and characterization of a set of insulated bacterial promoters. Nucleic Acids Research, 2011, Vol. 39, No. 3 1131–1141
5. Haldimann A., Daniels L.L, Wanner B. L.; Use of New Methods for Construction of Tightly Regulated Arabinose and Rhamnose Promoter Fusions in Studies of the Escherichia coli Phosphate Regulon. Journal of Bacteriology, Mar. 1998, p. 1277–1286
6. Holcroft C.C, Egan S.M. Roles of Cyclic AMP Receptor Protein and the Carboxyl-Terminal Domain of the a Subunit in Transcription Activation of the Escherichia coli rhaBAD Operon. Journal of Bacteriology, June 2000, p. 3529–3535
7. Giacalone M.J. et.al., Toxic protein expression in Escherichia coli using a rhamnose-based tightly regulated and tunable promoter system. BioTechniques 40:355-364 (March 2006)
8. Song S., Park C.; Organization and Regulation of the D-Xylose Operons in Escherichia coli K-12: XylR Acts as a Transcriptional Activator. Journal of Bacteriology, Nov. 1997, p. 7025–7032
9. Hans Peter Sørensen, Kim Kusk Mortensen., Advanced genetic strategies for recombinant protein expression in Escherichia coli. Journal of Biotechnology, August 2004.
10. Jason R. Newman, Clay Fuqua., Broad-host-range expression vectors that carry the L-arabinose-inducible Escherichia coli araBAD promoter and the araC regulator. Gene, Nov. 1998.
11. Lehmeier B, Amann E. Tac promoter vectors incorporating the bacteriophage T7 gene 10 translational enhancer sequence for improved expression of cloned genes in Escherichia coli. Journal of Biotechnology, Apr. 1992; 23(2):153-65.