Revision as of 02:05, 19 October 2016

Cyantific: UC Davis iGEM 2016

Protein Discovery

Metagenomic mining was used to discover 13 protein sequences we believed were plausible GAF domain homologs in an effort to expand the known color range of the GAF domain. To assist in future enzymatic discovery, a predictive model was developed that outputs whether an unknown sequence is likely to reflect in the blue, red, or yellow spectrum. Three of the 13 sequences were transformed into E.coli and subsequent analysis were performed to find that our model correctly predicted all three GAF sequences would reflect above 600 nm.

Process and Motivation

Since metagenomes provide diverse and largely undiscovered species, we used the Joint Genome Institute Integrated Microbiome database to mine the metagenome samples for sequences that were similar to known blue reflecting GAF domains. This process begins by BLAST searching a known GAF domain against deposited metagenome samples. This search will return an E-value, or the expected value that this alignment will occur by random. Lower E-values mean the less likely your alignment is a false positive, because of this our team only used alignments with an E-value of 1e-50 of lower. Of the remaining alignments, the surrounding protein families, top genomic isolate hits, and metagenome sample annotation was taken into consideration before choosing which homologs to attempt transformations with. After searching more than 100 metagenomes, our team narrowed it down to 13 plausible GAF domain homolog sequences which came from the following environments:

Hot spring microbial communities from Yellowstone National Park, Wyoming, USA
Groundwater microbial communities from subsurface biofilms in sulfidic aquifer in Frasassi Gorge, Italy
Estuarine microbial communities from Elkhorn Slough, Monterey Bay, California, USA
Bacteria ecosystem from Joint Genome Institute, California, USA
Hypersaline mat microbial communities from Guerrero Negro, Mexico

Although we chose to only investigate 13 sequences, there were many more we could have chosen. The amount of genomic and metagenomic data deposited in sites like JGI, NCBI and HMMER is massive and continually growing (23). Already there is 17,000 genomes deposited in NCBI(23). It’s relatively easy to find protein homologs, but it’s much harder to choose which genes to investigate. This is complicated even further if you are searching for a subgroup within one protein family, like searching for just blue reflecting GAF domains. For this reason, we developed a model to assist in choosing the GAF protein homologs based on subgroup identity.

Model development

Web deployed model

In order to refine the GAF homologs down to the subgroup level, ideally we needed a model that could identity patterns in protein sequences which affect the subgroup identity. In searching for existing programs, we discovered over twenty programs dating back to 1996 that were designed for a similar purpose. However, many were not maintained over the years or open access. Of the 12 programs our data was tested set on, most failed to correctly separate the protein family based on subgroup. Since we were unable to find a program which fit our needs, we developed our own.

Historically, consensus sequences and pro-site maps were used to identify influential patterns in sequences. While these methods are effective for highly conserved sequences and regions, they require arbitrary data cut-offs which makes their methods a bit coarse. One method which does not require these cut-offs are profile Hidden Markov Models, or profile HMMs. Profile HMM’s are statistical models that give the probability of any amino acid or DNA base at any position in the sequence. For example, if there is a lysine at position 2 in the sequence alignment this informs not only the probability of lysine at position 2, but the probability of other similar amino acids at that position. Profile HMM’s have become widely used in identifying protein families, even Pfam the protein family database used HMM’s to curate their database (24).

Specifically, the model began with about 100 previously characterized GAF protein sequences provided by Clark Lagarias’s laboratory which we then divided into blue, yellow, and red subgroups. The blue subgroup represents any GAF sequence that absorbs in the red spectra, and thus reflects and appears blue. For the purposes of this model, GAF domains absorbs from 550nm and below are considered in the yellow subgroup, from 550-600nm are considered in the red subgroup, and from from 600nm and above are considered in the blue subgroup. For every subgroup we created multiple sequence alignments that were influenced by the protein structure using Promals3D. Using the protein structure in the multiple alignment allows relatively divergent clusters to be aligned with more accuracy because it takes into account sequence-based constraints derived from the 3D structures (20) . After creating the three multiple sequence alignments, a profile HMM was created using HMMER. The model then aligns the user given sequence and compares it to all three profile HMMs.By calculating the log2 odds ratio of the observed to expected E value, a quantifiable score is produced. This model now has a web interface and is downloadable on github here.

To assess the accuracy of our model a four-fold cross validation was performed. This meaning a random quarter of the GAF sequences were removed and only the remaining 3/4ths were used for the profile HMM creation. This ensures that the model is not populated with the sequences which will be used for assessing accuracy, and thus our accuracy measurement is not biased. The reduced model constructed from just 3/4ths of the original sequences would then be queried by the randomized quarter of previously removed sequences and calculated how accurately the model could blindly predict the sequences. We repeated this three other times until all random and no overlapping quarters were used as the query.Through this validation we calculated our model has a 75% accuracy which is 42% more accurate than would occur by random chance.

Future Goals

In the future, our team would like to refine this model to have more specificity. Instead of our model predicting three subgroups, we would like to alter it to predict the wavelength range the GAF sequence will absorb at. This level of specificity would be beneficial in distinguishing more nuanced differences such as orange from yellow and blue from green.

Next: Novel GAF Expression

http://www.chicagotribune.com/business/ct-artificial-food-coloring-0626-biz-20160624-story.html
http://sensing.konicaminolta.us/products/
http://www.mars.com/global/about-us/policies-and-practices/color-policy
http://www.nbcnews.com/business/consumer/kraft-mac-cheese-says-goodbye-dye-n345006
http://blog.generalmills.com/2015/06/a-big-commitment-for-big-g-cereal/
http://www.nestleusa.com/media/pressreleases/nestl%C3%A9-usa-commits-to-removing-artificial-flavors-and-fda-certified-colors-from-all-nestl%C3%A9-chocolate-candy-by-the-end-of-20
http://www.fao.org/docrep/016/ap106e/ap106e.pdf
https://pubchem.ncbi.nlm.nih.gov/compound/Acid_Blue_9#section=Stability
http://www.fda.gov/ForIndustry/ColorAdditives/ColorAdditiveInventories/ucm115641.htm
http://link.springer.com/article/10.1007/s00217-004-1062-7
http://www.instructables.com/id/Blue-Foods-Colorful-cooking-without-artificial-dy/
A Brief History of Phytochrome Nathan C. Rockwell and J. Clark Lagarias
Hirose, Yuu, et al. "Green/red cyanobacteriochromes regulate complementary chromatic acclimation via a protochromic photocycle."Proceedings of the National Academy of Sciences 110.13 (2013): 4974-4979.
Rockwell, Nathan C., and J. Clark Lagarias. "A brief history of phytochromes." ChemPhysChem 11.6 (2010): 1172-1180.
Bussell, Adam N., and David M. Kehoe. “Control of a Four-Color Sensing Photoreceptor by a Two-Color Sensing Photoreceptor Reveals Complex Light Regulation in Cyanobacteria.” Proceedings of the National Academy of Sciences of the United States of America 110.31 (2013): 12834–12839. PMC. Web. 7 Oct. 2016.
http://www.nutritionaloutlook.com/food-beverage/coloring-spirulina-blue
Nature’s Palette: The Search for Natural Blue Colorants
https://www.novozymes.com/en
http://www.dsm.com/corporate/home.html
http://prodata.swmed.edu/promals3d/promals3d.php
https://www.neb.com/products/e6901-impact-kit
Heim R, Cubitt A, Tsien R (1995). "Improved green fluorescence" (PDF). Nature. 373 (6516): 663–4.doi:10.1038/373663b0. PMID 7854443.
https://www.ncbi.nlm.nih.gov/genome/browse/
http://pfam.xfam.org/
Red/Green Cyanobacteriochromes: Sensors of Color and Power - lagarias
https://en.wikipedia.org/wiki/Brilliant_Blue_FCF#/media/File:Blue_smarties.JPG
Harnessing phytochrome's glowing potential - clark
Stability of phycocyanin extracted from Spirulina sp.: Influence of temperature, pH and preservatives- R Chaiklahan, N Chirasuwan, B Bunnag - Process Biochemistry, 2012 - Elsevier
http://www.lightboxkit.com/Assay_dye.html
"Your Brain - A User's Guide - 100 Things You Never Knew." National Geographic Time INC. Specials 12 Feb. 2016: Print.
Sensing, Konica Minolta. "How Color Affects Your Perception of Food." Konica Minolta Color, Light, and Display Measuring Instruments. Konica Minolta Sensing Americans Inc., 2006. Web. 10 Oct. 2016.
"Center for Disease Control and Prevention - General Information Escherichia Coli." CDC 24/7: Saving Lives, Protecting People. Centers for Disease Control and Prevention, 06 Nov. 2015. Web. 10 Oct. 2016.
Olmos J, Paniagua-Michel J (2014) Bacillus subtilis A Potential Probiotic Bacterium to Formulate Functional Feeds for Aquaculture. J Microb Biochem Technol 6: 361-365. doi:10.4172/1948-5948.1000169
Zeigler, Daniel R. "Bacillus Genetic Stock Center." Bacillus Genetic Stock Center Website. The Bacillus Genetic Stock Center - Biological Sciences, Oct. 2016. Web. 10 Oct. 2016.
Vellanoweth, Robert Luis, and Jesse C. Rabinowitz. "The Influence of Ribosome binding-sites Elements on Translational Efficiency in Bacillus Subtilis and Escherichia Coli in Vivo." Molecular Microbiology 1105-1114, 15 Jan. 1992. Web. 9 Oct. 2016.
"Pveg (Plasmid #55173)." Addgene - The Nonprofit Plasmid Repository. Synthetic Biology; Bacillus BioBrick Box, Aug. 2016. Web. 10 Oct. 2016.
Gambetta, Gregory A., and Clark J. Lagarias. "Genetic Engineering of Phytochrome Biosynthesis in Bacteria." Section: Molecular and Cellular Biology. Proceedings of the National Academy of Sciences of the United States of America, 19 July 2001. Web. 9 Oct. 2016.
"Registry of Standard Biological Parts - B0015." Part: BBa_B0015. International Genetically Engineered Machine Competition (iGEM), 17 July 2003. Web. 9 Oct. 2016.
"Bacillus Subtilis - Indiamart." IndiaMART InterMESH Ltd., n.d. Web. 9 Oct. 2016.

@@ Line 16: / Line 16: @@
 	<div class="row row-centered">
 		<div class="col-xs-5" id="genomeImg">
-			<img src="pictures/UCD16_metagenome.png"  alt="">
+			<img src="https://static.igem.org/mediawiki/2016/c/c6/T--UC_Davis--UCD16_metagenome.png"  alt="">
 		</div>
 		<div class="col-xs-7 col-centered projP">
@@ Line 42: / Line 42: @@
 <hr>
 <div class="container-fluid centered">
-			<img src="pictures/UCD16_HMMpt1.png" class = "hmmImg" id = "hmm1" alt="">
+			<img src="https://static.igem.org/mediawiki/2016/d/dd/T--UC_Davis--UCD16_HMMpt1.png" class = "hmmImg" id = "hmm1" alt="">
-			<img src="pictures/UCD16_HMMpt2.png" class = "hmmImg" id = "hmm2" alt="">
+			<img src="https://static.igem.org/mediawiki/2016/4/4d/T--UC_Davis--UCD16_HMMpt2.png" class = "hmmImg" id = "hmm2" alt="">
 </div>
 <hr>
@@ Line 53: / Line 53: @@
-			<h3 class = "inline">Future Goals</h3><img src="pictures/UCD16_future.png" class = "img-rounded bactImg"alt="">
+			<h3 class = "inline">Future Goals</h3><img src="https://static.igem.org/mediawiki/2016/c/c4/T--UC_Davis--UCD16_future.png" class = "img-rounded bactImg"alt="">
 			<p>In the future, our team would like to refine this model to have more specificity. Instead of our model predicting three subgroups, we would like to alter it to predict the wavelength range the GAF sequence will absorb at. This level of specificity would be beneficial in distinguishing more nuanced differences such as orange from yellow and blue from green.</p>
 		</div>
 			<div class="col-xs-6 col-centered projLinks">
-				<a href="novel.html" class="navBtn">Next: Novel GAF Expression</a>
+				<a href="https://2016.igem.org/Team:UC_Davis/Novel" class="navBtn">Next: Novel GAF Expression</a>
 			</div>
 	</div>
@@ Line 115: / Line 115: @@
      <div class = "container-fluid footer">
 	<div class="row row-centered">
-		<div class="col-xs-8 col-centered">
+		<div class="col-xs-8 col-centered centered">
 			<p>Contact us at: ucdigem@gmail.com </p>
 		</div>

Difference between revisions of "Team:UC Davis/Discovery"