RESULTS - ORFS
In all known organisms, from bacteria to humans, the same triplets of mRNA bases, which code for the same amino acids with just a few exceptions. However, genomes of all species are encoded in different ways. This is the result of degeneracy in the genetic code, which means that some amino acids are encoded by more than one codon, thus the same protein sequence can be synthesized by various mRNAs. The synonymous codons in protein coding genes are not necessarily random; codon usage bias is correlated with imbalance in the tRNA pool. Production of recombinant proteins mainly focuses on microorganisms, such as Escherichia coli. Variation in codon usage is considered to be the major problem in heterologous protein production. It can contribute to slow down the translational process and induce error introduction. Thus, the codon optimization is necessary to adjust the synthetic gene’s sequence to replace rare codons with codons occurring in efficiently translated mRNA in host’s transcriptome. Much of the codon-usage literature focuses on inefficient translation of a set of rare codons in E. coli [Kane, 1995] especially those for arginine, namely AGG and AGA. To overcome this problem and enhance the expression of “difficult” ORFs containing rare codons there are available special E. coli strains like BL21 Codon Plus [Rosano GL, Ceccarelli, 2009], which encode additional tRNAs and recognize these codons.
First construct dedicated to this phenomenon was sfGFP with four rare arginine codons located before its two stop codons TAA (sfGFP-4R-Ba_K1481003 biobrick provided to iGEM in 2014 by Poznan_Bioinf team). We wanted to check to what extent such four consecutive codons could be detrimental to the biosynthesis of a stable and soluble recombinant protein in E. coli. Surprisingly it appeared that there is no clearly visible difference between the rate of sfGFP reading frame under identical promoter with an identical 5’UTR (biobrick: Ba_K1481002 provided by Poznan_Bioinf as well) and sfGFP_4R biosynthesis. To strengthen this motif we extended it to eight consecutive AGG/AGA codons (Fig.1). We also decided to compare the translation rate of sfGFP_8R with another construct sfGFP_TagX. Instead of arginine residues, sfGFP_TagX has a degradation tag at before stop codons (coding for a decapeptide sequence: ANDENYALAA), which is added to defective mRNAs devoid of stop codons, by translation of tmRNA which binds to the vacant A-site on a stalled ribosome and directs the truncated protein to proteolysis [Hayes et al, 2001]. The results we obtained indicate that neither the 8-arginine tag nor degradation tag on C-terminus of coding sequence seems to be detrimental for recombinant protein production in our system (Fig. 2,3,4).
Fig. 1. The set of biobricks composed of different variants of sfGFP reading frame under identical arabinose promoter (AraC-pBAD) starting from the stable 6-histidine tag at the N-terminus. In the grey box these constructed by our team in 2016.
We decided to check what is the difference in the protein synthesis rate between two very different versions of the same reading frame, one composed of the most common codons which occur in E. coli orfeome (sfGFP-B, the best choice) and a very different codon-monotonous as well ORF but composed of the rarest codons (sfGFP-W, the worst case) (Fig. 1). According to bioinformatics analysis we indicated 7 codons that occur in E.coli genome with the least frequency rate. They encode: Arginine (AGG), Leucine (CTA), Isoleucine (ATA), Glycine (GGA), Serine (TCA), Proline (CCC) and Threonine (ACA). These codons represent 40% of all codons of sfGFP_Worst ORF.
The results we obtained (Fig. 2, 3, 4) suggest that in E. coli cells rich media, the introduction of rare codons into a sequence coding for a well soluble protein, expressed at a moderate level (like in pBAD systems) is not sufficient to observe any significant decrease in the rate of its translation. The translational rate of sfGFP from inverse optimized ORF is higher or equal to sfGFP biosynthesized from the most frequent codons. This indicates that E.coli translational apparatus can be easily adjusted or that such a gene takes advantage of this that it uses a different tRNA pool than highly expressed proteins do at the same time. In contrast to rich media, in M9 minimal medium the codon optimization based on codon usage is important so probably E. coli cells cannot quickly biosynthesize minor tRNA molecules and keep up with translation from ORF composed of the least frequent codons with same efficiency as optimized ORF.
Fig. 2. Translational rate of different variants of sfGFP ORFs during 6h culture of E. coli DH5α in the richest medium – SB/PKB upon induction with L-arabinose (0h) (0,4% final concentration).
Fig. 3. Translational rate of different variants of sfGFP ORFs during 6h culture of E. coli DH5α in the rich medium – LB upon induction with L-arabinose (0h) (0,4% final concentration).
Fig. 4. Translational rate of three different variants of sfGFP ORFs during 6h culture of E.coli DH5α in M9 minimal medium upon induction with L-arabinose (0h) (0,4% final concentration). Protein expression was induced at OD600= 0,8.
Our results were difficult to believe because many people wanting to improve translation of heterologous protein optimize synonymous codon usage to better match the host organism. We decided to check to what extent codon optimization based on orfeome influence the sfGFP biosynthesis under another induced promoters to be sure that what we observe is not dependent on the choice of the arabinose induced promoter. We have chosen two other promoters from our collection: a rather weak promoter - pxylF-xylA (wild-type xylose promoter) and a strong one - pxylS-E1_5’UTR. We have created another four constructs with sfGFP_B and sfGFP_W composed of exclusively the most frequent or the rarest codons in E. coli orfeome under both wild-type xylose induced promoter and pxylS-E1_5’UTR. The results of translational rate obtained from fluorescence measurements were referred to non-optimized sfGFP coding sequence (BBa_K1741007 and BBa_K2014002). The sfGFP_W composed of the rarest codons is efficiently translated at the low/moderate level (under control of pxylF-xylA promoter) (Fig. 5). The translational rate of sfGFP_W is twice higher than optimized sfGFP_B. The opposite situation is observed between sfGFP biosynthesis under control of stronger promoter (pxylS-E1_5’UTR is most likely the strongest available version of a xylose-induced promoter in E. coli (See: BBa_K2014002). The translation of sfGFP_W is three times lower than sfGFP_B (See more: BBa_K2014009, BBa_K2014011).
Fig. 5. Translational rate of three different variants of sfGFP ORFs transcribed under control of weak and strong xylose-induced promoter during 6h culture of E.coli DH5α in LB medium upon induction with D-xylose (0h) (0,4% final concentration).
In order to check if the same results can be observed in the case of a fluorescent protein of a different sequence and structure we designed two variants of mRFP coding sequence. Codon optimized - composed of exclusively the most frequent codons in E.coli orfeome (mRFP_B) and mRFP_W - composed of synonymous rare codons (mRFP_W). The results present significant decrease in the translation rate from inverse optimized ORF (mRFP_W) comparing to the optimized one (mRFP_B) not only in M9 minimal medium but also in both rich media (SB/PKB and LB) after 18h of E.coli DH5α culturing. Results are shown in biobrick description (BBa_K2014008 ).
1. We have found that there is no obvious difference between sfGFP translation encoded by optimized codons (which are the most frequent in E. coli orfeome) and the inversely optimized ORF composed of the synonymous rare codons in E. coli cells growing in rich media (SB/PKB or LB).
2. In M9 minimal medium the codon optimization based on codon usage is more important because the translational apparatus has to adjust in E. coli cells and biosynthesize the less common tRNA molecules. As a result sfGFP_B ORF is translated faster than sfGFP_W one.
3. Since the difference between translation efficiency of codon optimized and inversely optimized ORF based on whole orfeome codon usage is not big we continue our investigations to test codon adaptation index and codon contexts.
Codon optimization based on orfeome still leaves many questions unanswered and that’s why we are looking for any other general way to optimize ORFs. We are working now on contrasting ORFs, which are AT or GC rich, with codons optimization based on codon adaptation index. Further experiments concern examining codon context effect, a hypothesis claiming that the translational efficiency of a given codon depends highly on its neighboring codons. According to the software, Codon Composer, designed by our advisor, Melania Nowicka, she calculated a codon pair ranking for any set of ORFs and optimized the codon context considering restriction enzyme sites and other aspects at the same time. The ranking and the ORF scores are calculated using the Codon Pair Bias measure (see Methods). Melania has prepared four different rankings for different sets of coding sequences: whole orfeome and sets of 300, 200 and 100 most abundant proteins. The comparison between results for those rankings could help to find out if there is a correlation between single codon and codon pair frequencies and protein abundance.
1. Elena C, Ravasi P, Castelli ME, Peiru S, Menzella HG. (2014) Expression of codon optimized genes in microbial systems: current industrial applications and perspectives.Front. Microbiol., 04 February 2014
2. Hayes CS, Bose B and Sauer RT. (2001) Stop codons preceded by rare arginine codons are efficient determinants of SsrA tagging in Escherichia coli. Proc Natl Acad Sci U S A. 19; 99(6):3440-5.
3. Kane JF. (1995) Effects of rare codon clusters on high-level expression on heterologous proteins in Escherichia coli. Current Opinion in Biotechnology, 6:494-500.
4. Pédelacq JD, Cabantous S, Tran T, Terwilliger TC, Waldo GS. (2006) Engineering and characterization of a superfolder green fluorescent protein. Nat Biotechnol. 2006 Sep;24(9):1170.
5. Rosano GL and Ceccarelli EA. (2009) Rare codon content affects the solubility of recombinant proteins in a codon bias-adjusted Escherichia coli strain. Microbial Cell Factories 2009, 8:41.
6. Pédelacq JD, Cabantous S, Tran T, Terwilliger TC, Waldo GS. (2006) Engineering and characterization of a superfolder green fluorescent protein. Nat Biotechnol. 2006 Sep;24(9):1170.
7. Vondrášek J, Mason PE, Heyda J, Collins KD and Jungwirth P. (2009) The Molecular Origin of Like-Charge Arginine−Arginine Pairing in Water. J. Phys. Chem. B, 2009, 113 (27), pp 9041–9045