Team:Dalhousie Halifax NS/Description

Dalhousie iGEM 2016

Cellulolytic enzymes have become a very interesting field of research for the potential production of microbial biofuels. Biofuels, such as ethanol, are fuels that microorganisms could create from simple feedstocks like cellulose. Cellulose is generally a difficult carbon source to degrade and therefore provides less available carbon for use as energy. However, using organisms, or proteins from organisms, that are capable of degrading cellulose we can make available many feedstocks for biofuel production. Producing biofuel from things like leftover vegetables, lumber industry by-products and waste, seaweed and other easily obtainable sources could provide an answer to the ever-growing problem of energy production.

The Softwood Lumber industry is particularly pertinent for Nova Scotia. In 2014, Nova Scotia exported roughly 668 Million dollars in softwood lumber products including pulp and paper products, primary lumber products and fabricated wood materials. The softwood lumber industry in Nova Scotia also harvested roughly 29,000 hectares of forest for the creation of softwood products. This industry produces many useful products as well as many useless by-products. Imagine taking these useless by-products and turning them into a sustainable form of energy like an ethanol biofuel through a simple biological reactor using E. coli and enzymes harvested from cellulose-degrading bacteria. This is what we are aiming to do with this year's project.

Our project will characterize the microbiomes from the North American Porcupines found in Nova Scotia forests and attempt to "mine" the microbiome for useful enzymes. A microbiome is the collection of microbes found in and on larger organisms. Humans have microbiomes on their skin, in their noses, in their ears and in many other places as well. The most important human microbiome, however, is the gut microbiome. It helps us digest many otherwise undigestable foods that we consume and provides many nutritional byproducts for us. The research in this area is new, but we are learning more and more of the importance of a healthy gut microbiome. Interestingly, the North American Porcupine eats mostly bark and wood from trees found in Nova Scotia. Since the Porcupine’s own cells cannot digest plant matter, this woodland creature, then, must contain gut bacteria that aid in digestion of all the cellulose it consumes. Thus, the gut microbiome of the North American porcupine seems like a logical starting point for our search of a microbiome to mine.

Our project is designed in three stages. First: we hope to isolate and characterize the microbiomes of many animals found at the Shubenacadie Wildlife Park. Second: we hope to isolate and characterize particular cellulose-degrading bacteria using a cellulose agar medium. Third: we aim to construct a metagenomic library of the microbial DNA found in the porcupine microbiome as a way to isolate and characterize genes for useful cellulose degrading enzymes that can be expressed in E. coli. Each of these experimental stages are broken down into smaller pieces below. We invite you to check out our entire process and walk with us as we explore the vast possibilities of cellulose degraders, microbiomes and biofuel production.

Shubenacadie Wildlife Park Microbiome Survey


The function of this microbiome survey is to get a handle on what kinds of organisms are present in mammal microbiomes to see if the use of microbiomes as a "mine" for biological engineering purposes is feasible. This part of our project is designed to provide evidence that a metagenomic library constructed from the DNA extracted from gut microbiomes is indeed useful for biological engineering. In our case, specifically for degrading softwood lumber waste and turning it into useful compounds like bio-plastics and bio-fuels. Through the use of 16S taxonomy data and through PICRUSt we intend to show that these microbiomes are a gold mine for biological engineering.


Shubenacadie Wildlife Park:

The Shubenacadie Wildlife Park is home to 29 mammals and 35 birds on a 40 hectare piece of land. These animals were rescued as injured or sick from the wild, and were rehabilitated. Some animals will be unable to return to the wild and live out the rest of their lives at the park. These animals live in enclosures that simulate environments that are comfortable to them. For example, the rescued otters have a large pool which they can swim in. More about the wildlife park can be found here.

The wildlife park has been an invaluable resource. The animals are fed a supplemented diet, which consists mainly of natural diets with added food like fruits/veggies for the herbivores and protein supplements for the carnivores. Because they eat a supplemented diet, and diet does effect microbiome content, there is potential error associated with the use of the park. However, we considered this error to be minimized as the mammals are eating many of the same things they would in nature. The park was also the safest way for us to obtain fecal samples, so associated error was acknowledged as part of this process.

DNA Extraction:

We obtained fecal samples from 21 mammals at the Shubenacadie Wildlife Park. These fecal samples contain millions of each animal’s gut microbes, which make up the microbiome’s of their gut. The DNA extraction step allows us to identify these bacteria through the bacterial DNA found in each fecal sample. DNA extraction was done using MoBio’s PowerFECAL kit which uses both alkaline lysis and mechanical lysis. Alkaline lysis works through a detergent, which disrupts the membranes of the bacterial cells. Mechanical lysis works through small ceramic beads shaken in the tube at high speeds to mechanically break up bacterial membranes and release the bacterial cell contents. The broken up bacterial cells are run through a column, where DNA binds but all other cell material is left behind. The columns were washed with ethanol and eluted using a buffer provided by MoBio. We are left with clean, pure genomic DNA from all the cells in the fecal sample.

Polymerase Chain Reaction:

In order to prepare our sample for illumina sequencing, we first had to create many copies of a specific region of the genomes found in the extracted DNA by polymerase chain reaction (PCR). The specific region we require for microbial identification is the 16S rRNA gene. This gene is used widely as a molecular phylogeny taxonomic identifier, giving us a good idea of what bacteria are present.

PCR works by using small oligonucleotides that are called primers, and a heat-stable DNA polymerase. These oligonucleotides are used to provide a free 5’-OH group, essentially priming the reaction, that the heat-stable polymerase uses to amplify DNA from a template strand. A machine called a thermocycler works to cycle the temperature of the reaction in order to make the reaction stop and start in a chain reaction. The cycle has three steps: denature, anneal and elongate. The denaturation step melts the double-stranded DAN molecule into single-stranded molecules. The annealing step allows the oligonucleotide primers to bind to the single-stranded molecules at their complementary sequence. The elongate step allows the heat-stable polymerase to extend the DNA sequence. After many cycles, we have exponential amplification of the segment of DNA that is flanked by the oligonucleotide primers. For illumina sequencing, the oligonucleotide primers also contain an adapter region that is added onto the amplified 16S rRNA gene DNA, allowing the DNA to be compatible with the illumina sequencer flowcell. The adapter addition is important because it allows us to be more specific in what is sequenced, as only DNA containing the adapter will be sequenced.

Illumina Seqeuncing:

The environmental DNA extraction provided us with clean DNA from the organisms in the Shubenacadie Wildlife park mammal microbiomes. The genomic DNA mixture provides us with a lot of biological information such as major functions of the microbiomes and the organisms present within it. This information is stored within the DNA sequence itself. To get access to this sequence information we used Illumina MiSeq.

The illumina sequencer works on a principle called sequence-by-synthesis. When the amplified DNA from the PCR reaction is placed into the flow cell, the adapter region added by PCR binds to complementary oligonucleotides attached to the surface of the flow cell. These oligonucleotides serve as a free 5’-OH group to “prime” the sequencing-by-synthesis reaction. The bound PCR products are sequenced through the addition of fluorescently labelled deoxy-ribonucleotides that glow when they are incorporated into the DNA molecule. Each of these nucleotides glow a different colour, so to identify the nucleotide being incorporated. The illumina sequencer records each of these flashes of coloured light and is able to interpret this information to provide a sequence. This happens hundreds of thousands of times in a massively parallel process to sequence all of the amplified DNA from our PCR amplification step.

Using PCR we can amplify all of the 16S regions found in our fecal sample DNA extractions and then using the Illumina Miseq machine obtain the sequence information. The sequence information can then be used to determine what bacteria are found in the sample.

Illumina sequencing provides a massive amount of data that must then be sifted through the use a computer. For this task we use QIIME analysis tools with the help of the Integrated Microbiome Resource here at Dalhousie University and Dr. Morgan Langille’s Microbiome Helper tool.

QIIME Analysis

To fully understand the data that the illumina sequencer gives us, we must use a data analysis pipeline. A pipeline is simply a set of tools used in the workflow of data analysis to provide us with a clear picture of the data we have collected. The microbiome helper tool, developed by Dr. Morgan Langille, contains the entire Quantitative Insights Into Microbial Ecology (QIIME, pronounced “chime”) set of tools, which collectively are an open-source pipeline that wraps many other tools necessary for the analysis of illumina sequenced 16S rRNA gene data. This pipeline makes it possible to analyze 16S data efficiently and with accuracy to determine the taxonomy of the microbial community represented by the 16S sequences.

The data that comes from the illumina sequencer is raw data. It contains sequences that are important to us, but it also contains chimeric sequences, bad quality sequences, very short sequences and lots of other junk data that could mess with our taxonomic assessment down the pipeline. So we must use the QIIME tools in order to clean up the raw data into something useful.

The first step is to align forward and reverse read’s, which come out of the sequencer as single-strand reads, and put them into double-stranded reads. QIIME uses PEAR (Paired-End reAd mergeR) to do this.

The next step is to use a statistics tools called fastQC, which provides us with information of the read quality and length from the paired reads created in the PEAR step. This step allows the filtering of reads to remove short and bad quality reads from our samples. fastQC also has scripts for this process.

Following filtering of the reads, chimeric read’s must be removed. A chimeric read is an piece of amplified DNA from the PCR step that is a combination of two separate molecules that became joined as a byproduct of PCR. Because the 16S region of the bacterial and archaeal genomes are conserved, they often contain some level of sequence similarity. This occasionally causes one single-stranded 16S DNA molecule to base-pair to, and prime, the PCR amplification of another leading to a new, chimeric 16S gene that is not indicative of any bacterial or archaeal species.

These reads must be removed to make sure that our data is accurate. This is done by VSEARCH, which stands for vectorized search. VSEARCH attempts to align the sequences to genomes. If half of the sequence matches one region, while the second half of the sequence matches another half, the program knows that the sequence is chimeric and not a biologically distinct sequence.

Next, the non-chimeric, paired and filtered reads are matched against a database. The database matching step picks operational taxonomic units (OTUs), which are represented by distinct 16S gene sequences, and compares them to a database to be taxonomically identified. There are three primary ways to pick OTUs. De-novo OTU picking matches 16S sequences together into OTUs without the use of any reference database, Open OTU picking matches 16S sequences to an open-sourced database like BLAST and Closed OTU picking matches 16S sequences to a closed database like GreenGenes. The method we used is de-novo OTU picking, which does not use a reference database for OTU picking, and then compares the OTU’s to a database for identification. De-novo OTU picking is much slower than using a reference database to determine OTUs, but it provides a more wholly inclusive dataset. It is more inclusive because any 16S sequence not matched to a reference database in open and closed OTU picking are thrown out. Through de-novo OTU picking, the program does not throw out 16S data. The taxonomic assignment occurs in 7 steps, which are all wrapped by the pick_denovo_otus script provided by QIIME:

  1. Pick OTUs based on sequence similarity within the reads (de-novo picking)
  2. Pick a representative sequence for each OTU
  3. Assign taxonomy to OTU representative sequences
  4. Align OTU representative sequences
  5. Filter the Alignment
  6. Build a Phylogenetic Tree
  7. Make the OTU table

Each step here gets us to the OTU table, which can be used for downstream analysis including chart creation and PICRUSt analysis.

Lastly we remove low confidence OTUs that are represented by less than 0.1% of the sequencing reads. This number is chosen because it is the predicted sample bleed-through on the illumina MiSeq machine, making it a cutoff for removal of low confidence OTUs.

Using QIIME we can then create alpha- and beta-diversity charts for the samples, and pie/bar charts for taxonomy for each phylogenetic group (class, family, genus, etc). Examples of these can be seen below, and these charts from our fecal sample analysis can be found here

PICRUSt:

After taxonomic assignment, a tool called Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) was used to determine functionality of OTU’s within the microbiomes of each animal.

PICRUSt works by predicting the gene families contained inside a bacterial or archaeal metagenome identified using 16S bacteria and archaeal sequencing. It uses QIIME taxonomy assignments that were done using a closed OTU search. In PICRUSt’s case, the OTU picking is done against the GreenGene database, and that database is then used to predict gene familes from the metagenomes associated with that database.

The PICRUSt algorithm works in two steps. The first is gene content predicition, which involves predicting the gene content of ancestral organisms from their living relatives. This step is extended to organisms that are part of the living relatives group, but are still unknown because they have not been studied. This is done by estimating the unknown living relative’s gene content through their 16S sequence similarity to studied and sequenced organisms. The PICRUSt algorithm therefore precomputes the gene content of all known organisms in a reference database. Because this is a particularly daunting task for even a strong computer, this step is only done once and the output is provided for the next step. This is why OTU picking from QIIME taxonomy assignments must be done to a closed reference database. Otherwise, matches will not correspond to the precomputed reference database and PICRUSt will be unable to complete the second step.

The second step is metagenome inference. PICRUSt infers the gene content of the organisms present in the OTU picking output by comparing them to their sequenced evolutionary relatives. This metagenome inference is done by comparing the 16S rRNA gene sequence to known sequences and makes conclusions on what each sequence’s evolutionary relative is. PICRUSt can then output a predicted metagenome for each 16S sequence picked in the closed OTU picking step and provide inferred gene content for the entire microbiome.

Isolation of Bacteria

Isolation:

Cellulose-degrading bacteria were isolated on cellulose media (see protocol page for media recipe) which contained ground whatman paper acting as the cellulose source. The media was used by a research group in India who were conducting very similar experiments, and they had had great success in isolating cellulose-degrading bacterium. After receiving fecal samples from Shubenacadie Wildlife Park, the team plated the samples on the cellulose media. For the initial round of porcupine samples, we made duplicates of every plate so that we could incubate in both aerobic and anaerobic conditions. Following a 24-hour incubation, we observed various colony morphologies. There were three prominent morphologies that we decided to pursue more closely. These morphologies appeared as white, yellow, and bulls-eye colonies. These colonies were then streak purified on LB media.

16S rRNA Sequencing

We prepared the samples for 16s rRNA sequencing. From this isolation method, we managed to isolate bacteria capable of growing on the cellulose medium. This was done as a quick and dirty identification of some of the cellulose degrading bacteria as a starting place for our project.

The 16S rRNA gene, as mentioned in the Microbiome Survey section and here in Isolation, is a perfect gene for taxonomy identification in bacteria. This is for three reasons.

  • DNA provides better taxonomic identification than traditional bacteriological assays like gram stains.
  • It is conserved only in prokaryotes, thus excluding eukaryotes from our identifications procedures
  • It is a short sequence, which makes it ideal for cheap Sanger sequencing and for high-throughput technologies like used in the Microbiome Survey section

The 16S rRNA gene contains nine variable regions, some of which are flanked with constant regions that can be used to amplify smaller fragments that contain the variable regions. These variable regions help us assign taxonomy. For 16S sequencing using high-throughput assays, it generally requires a smaller piece of DNA, so these constant regions can be useful when the input size is smaller than the entire gene.

Metagenomic Library

When we first begun our project we had every intention of completing a metagenomic library using the porcupine fecal sample as our environmental isolate. However, as other areas of the research began to erupt in exciting results, the library got put on the back burner. That being said, we did begin the construction of the library with the help from the Waterloo iGEM team who provided the pJC8 cosmid used in our library construction. Waterloo iGEM also sent us a protocol to help with our environmental metagenomic library construction. Although we didn’t get too far with the library, we did make some changes to the protocol to make the construction procedure more efficient.

We were originally interested in developing the metagenomic library as it would provide the ability to assess and exploit the whole microbial genomic composition found in our environmental sample. Metagenomics, in comparison to phytogenic surveys, does more than look at the diversity of one gene (ex. 16s rRNA gene). Metagenomics can provide insight in enzymatic pathways, evolutionary profiles, and microbial function.

How to make a metagenomic library (in short):

  1. Isolation of the DNA. It is important that the sample collected represents the entirety of the environment being examined. We used chloroform extraction to isolate large pieces of DNA from the porcupine fecal sample.
  2. Manipulation of the DNA. In order to be used in further downstream processes, the large environmental DNA pieces were cut by restriction enzymes which act at specific sites called restriction sites
  3. Incorporation into a vector. A vector is a DNA molecule that can carry foreign genetic material into another cell. It can then use the cells macromolecular synthesis machinery to produce proteins encoded by the foreign genetic material. Also, vectors usually contain a selectable marker which can be in the form of antibiotic resistance. The incorporation of the vector into a model organism provides that organism with selectable growth advantage, which can help to identify organisms containing the vector.
  4. Introduction of vector (and foreign environmental DNA) into a model organism. The vector is often introduced into the model organism, Escherichia coli, through a process called transformation. This allows for the insertion of DNA into the cell followed by the stable production of proteins. Transformation can occur in a few ways such as chemical, electrical, or biological methods. The transformed cells are then grown on selective media and only cells that contain the vector will survive (because of the antibiotic resistance segment in the vector). Each colony that grows can be used to study a specific fragment of DNA from the environmental sample.
  5. Examination of the DNA in the metagenomic library. The library can provide a wide-range of information on the sample being examined such as any chemical and physical properties. This means that there are many different methods of analysis that could be applied to study the metagenomic library.

Metagenomics provides the chance to search through the incredible biodiversity found in the environment without having to conform the sample to fit within standard laboratory techniques. When we finish constructing the library, we will use it to search for enzymatic and chemical pathways that are involved in cellulose-degradation.

References:

Comeau A. M., Li W. K. W., Tremblay J-É., Carmack E.C., Lovejoy C. (2011). Arctic Ocean Microbial Community Structure before and after the 2007 Record Sea Ice Minimum. PLoS ONE, 6(11), e27492.

Zhang, J., Kobert, K., Flouri, T., & Stamatakis, A. (2014). PEAR: A fast and accurate illumina paired-end reAd mergeR. Bioinformatics, 30(5), 614-620.

Rideout, J. R., He, Y., Navas-Molina, J. A., Walters, W. A., Ursell, L. K., Gibbons, S. M., et al. (2014). Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences. PeerJ, 2, e545.

McDonald, D., Price, M. N., Goodrich, J., Nawrocki, E. P., DeSantis, T. Z., Probst, A., et al. (2012). An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. The ISME Journal, 6(3), 610-618.

Kopylova, E., Noé L. and Touzet H. (2012). SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics, 28(24), 3211-3217.

Lozupone, C., & Knight, R. (2005). UniFrac: a New Phylogenetic Method for Comparing Microbial Communities. Appl. and Env. Microbiol., 71(12), 8228-8235.

Bolger, A. M., Lohse, M. and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114-2120.

Truong, D. T., Franzosa, E. A., Tickle, T. L., Scholz, M., Weingart, G., Pasolli, E., et al. (2015). MetaPhlAn2 for enhanced metagenomics taxonomic profiling. Nature Methods, 12, 902-903.

Kanehisa, M. and Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res., 28(1), 27-30.

Buchfink, B., Xie, C. and Huson, D., H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature Methods, 12, 59-60.

Abubucker, S., Segata, N., Goll J., Schubert, A. M., Izard, J., Cantarel, B. L., et al. (2012). HUMAnN: The HMP unified metabolic analysis network. PLoS Comput. Biol., 8(6), e1002358.

Parks, D. H., Tyson, G. W., Hugenholtz, P., Beiko, R. G. (2014). STAMP: statistical analysis of taxonomic and functional profiles. Bioinformatics, 30(21), 3123-3124.

Langille, M. G.I.*; Zaneveld, J.*; Caporaso, J. G.; McDonald, D.; Knights, D.; a Reyes, J.; Clemente, J. C.; Burkepile, D. E.; Vega Thurber, R. L.; Knight, R.; Beiko, R. G.; and Huttenhower, C. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nature Biotechnology, 1-10. 8 2013.