DESCRIPTION
Shubenacadie Wildlife Park Microbiome Survey | Isolation of Bacteria | Metagenomic Library
A “Spike” in Biofuel Production: Mining the Porcupine Microbiome to Engineer a Softwood Feedstock Platform
Dwindling fuel resources and rising environmental concerns have catalyzed the development of biofuel production in microorganisms. In Nova Scotia, softwood waste from the lumber industry is an untapped source for low-cost biofuel feedstock; however, this waste cannot be utilized by traditional biofuel processes due to toxic compounds such as turpentines and unavailable carbon compounds such as cellulose. The porcupine microbiome provides a unique solution as it is capable of digesting bark and toxic products. Working with Schubenacadie Wildlife Park, we aim to not only identify cellulose and/or turpentine-degrading bacteria in the porcupine microbiome, but to also characterize microbial communities found within the Park’s mammal population. To achieve these goals, we are using fecal samples to construct a DNA library of the porcupine and to analyze each mammal’s microbial rRNA. Future experiments include introducing identified cellulose and/or turpentine-degrading pathways into E. coli to produce an economically viable and sustainable biofuel-generating organism.
Shubenacadie Wildlife Park Microbiome Survey
The function of this microbiome survey is to get a handle on what kinds of organisms are present in mammal microbiomes to see if the use of microbiomes as a "mine" for biological engineering purposes is feasible. This part of our project is designed to provide evidence that a metagenomic library constructed from the DNA extracted from gut microbiomes is indeed useful for biological engineering. In our case, specifically for degrading softwood lumber waste and turning it into useful compounds like bio-plastics and bio-fuels. Through the use of 16S taxonomy data and through PICRUSt we intend to show that these microbiomes are a gold mine for biological engineering.
Shubenacadie Wildlife Park:
The Shubenacadie Wildlife Park is home to 29 mammals and 35 birds on a 40 hectare piece of land. These animals were rescued as injured or sick from the wild, and were rehabilitated. Some animals will be unable to return to the wild and live out the rest of their lives at the park. These animals live in enclosures that simulate environments that are comfortable to them. For example, the rescued otters have a large pool which they can swim in. More about the wildlife park can be found here.
The wildlife park has been an invaluable resource. The animals are fed a supplemented diet, which consists mainly of natural diets with added food like fruits/veggies for the herbivores and protein supplements for the carnivores. Because they eat a supplemented diet, and diet does effect microbiome content, there is potential error associated with the use of the park. However, we considered this error to be minimized as the mammals are eating many of the same things they would in nature. The park was also the safest way for us to obtain fecal samples, so associated error was acknowledged as part of this process.
DNA Extraction:
We obtained fecal samples from 21 mammals at the Shubenacadie Wildlife Park. These fecal samples contain millions of each animal’s gut microbes, which make up the microbiome’s of their gut. The DNA extraction step allows us to identify these bacteria through the bacterial DNA found in each fecal sample. DNA extraction was done using MoBio’s PowerFECAL kit which uses both alkaline lysis and mechanical lysis. Alkaline lysis works through a detergent, which disrupts the membranes of the bacterial cells. Mechanical lysis works through small ceramic beads shaken in the tube at high speeds to mechanically break up bacterial membranes and release the bacterial cell contents. The broken up bacterial cells are run through a column, where DNA binds but all other cell material is left behind. The columns were washed with ethanol and eluted using a buffer provided by MoBio. We are left with clean, pure genomic DNA from all the cells in the fecal sample.
Polymerase Chain Reaction:
In order to prepare our sample for illumina sequencing, we first had to create many copies of a specific region of the genomes found in the extracted DNA by polymerase chain reaction (PCR). The specific region we require for microbial identification is the 16S rRNA gene. This gene is used widely as a molecular phylogeny taxonomic identifier, giving us a good idea of what bacteria are present.
PCR works by using small oligonucleotides that are called primers, and a heat-stable DNA polymerase. These oligonucleotides are used to provide a free 5’-OH group, essentially priming the reaction, that the heat-stable polymerase uses to amplify DNA from a template strand. A machine called a thermocycler works to cycle the temperature of the reaction in order to make the reaction stop and start in a chain reaction. The cycle has three steps: denature, anneal and elongate. The denaturation step melts the double-stranded DAN molecule into single-stranded molecules. The annealing step allows the oligonucleotide primers to bind to the single-stranded molecules at their complementary sequence. The elongate step allows the heat-stable polymerase to extend the DNA sequence. After many cycles, we have exponential amplification of the segment of DNA that is flanked by the oligonucleotide primers. For illumina sequencing, the oligonucleotide primers also contain an adapter region that is added onto the amplified 16S rRNA gene DNA, allowing the DNA to be compatible with the illumina sequencer flowcell. The adapter addition is important because it allows us to be more specific in what is sequenced, as only DNA containing the adapter will be sequenced.
Illumina Seqeuncing:
The environmental DNA extraction provided us with clean DNA from the organisms in the Shubenacadie Wildlife park mammal microbiomes. The genomic DNA mixture provides us with a lot of biological information such as major functions of the microbiomes and the organisms present within it. This information is stored within the DNA sequence itself. To get access to this sequence information we used Illumina MiSeq.
The illumina sequencer works on a principle called sequence-by-synthesis. When the amplified DNA from the PCR reaction is placed into the flow cell, the adapter region added by PCR binds to complementary oligonucleotides attached to the surface of the flow cell. These oligonucleotides serve as a free 5’-OH group to “prime” the sequencing-by-synthesis reaction. The bound PCR products are sequenced through the addition of fluorescently labelled deoxy-ribonucleotides that glow when they are incorporated into the DNA molecule. Each of these nucleotides glow a different colour, so to identify the nucleotide being incorporated. The illumina sequencer records each of these flashes of coloured light and is able to interpret this information to provide a sequence. This happens hundreds of thousands of times in a massively parallel process to sequence all of the amplified DNA from our PCR amplification step.
Using PCR we can amplify all of the 16S regions found in our fecal sample DNA extractions and then using the Illumina Miseq machine obtain the sequence information. The sequence information can then be used to determine what bacteria are found in the sample.
Illumina sequencing provides a massive amount of data that must then be sifted through the use a computer. For this task we use QIIME analysis tools with the help of the Integrated Microbiome Resource here at Dalhousie University and Dr. Morgan Langille’s Microbiome Helper tool.
QIIME Analysis
To fully understand the data that the illumina sequencer gives us, we must use a data analysis pipeline. A pipeline is simply a set of tools used in the workflow of data analysis to provide us with a clear picture of the data we have collected. The microbiome helper tool, developed by Dr. Morgan Langille, contains the entire Quantitative Insights Into Microbial Ecology (QIIME, pronounced “chime”) set of tools, which collectively are an open-source pipeline that wraps many other tools necessary for the analysis of illumina sequenced 16S rRNA gene data. This pipeline makes it possible to analyze 16S data efficiently and with accuracy to determine the taxonomy of the microbial community represented by the 16S sequences.
The data that comes from the illumina sequencer is raw data. It contains sequences that are important to us, but it also contains chimeric sequences, bad quality sequences, very short sequences and lots of other junk data that could mess with our taxonomic assessment down the pipeline. So we must use the QIIME tools in order to clean up the raw data into something useful.
The first step is to align forward and reverse read’s, which come out of the sequencer as single-strand reads, and put them into double-stranded reads. QIIME uses PEAR (Paired-End reAd mergeR) to do this.
The next step is to use a statistics tools called fastQC, which provides us with information of the read quality and length from the paired reads created in the PEAR step. This step allows the filtering of reads to remove short and bad quality reads from our samples. fastQC also has scripts for this process.
Following filtering of the reads, chimeric read’s must be removed. A chimeric read is an piece of amplified DNA from the PCR step that is a combination of two separate molecules that became joined as a byproduct of PCR. Because the 16S region of the bacterial and archaeal genomes are conserved, they often contain some level of sequence similarity. This occasionally causes one single-stranded 16S DNA molecule to base-pair to, and prime, the PCR amplification of another leading to a new, chimeric 16S gene that is not indicative of any bacterial or archaeal species.
These reads must be removed to make sure that our data is accurate. This is done by VSEARCH, which stands for vectorized search. VSEARCH attempts to align the sequences to genomes. If half of the sequence matches one region, while the second half of the sequence matches another half, the program knows that the sequence is chimeric and not a biologically distinct sequence.
Next, the non-chimeric, paired and filtered reads are matched against a database. The database matching step picks operational taxonomic units (OTUs), which are represented by distinct 16S gene sequences, and compares them to a database to be taxonomically identified. There are three primary ways to pick OTUs. De-novo OTU picking matches 16S sequences together into OTUs without the use of any reference database, Open OTU picking matches 16S sequences to an open-sourced database like BLAST and Closed OTU picking matches 16S sequences to a closed database like GreenGenes. The method we used is de-novo OTU picking, which does not use a reference database for OTU picking, and then compares the OTU’s to a database for identification. De-novo OTU picking is much slower than using a reference database to determine OTUs, but it provides a more wholly inclusive dataset. It is more inclusive because any 16S sequence not matched to a reference database in open and closed OTU picking are thrown out. Through de-novo OTU picking, the program does not throw out 16S data. The taxonomic assignment occurs in 7 steps, which are all wrapped by the pick_denovo_otus script provided by QIIME:
- Pick OTUs based on sequence similarity within the reads (de-novo picking)
- Pick a representative sequence for each OTU
- Assign taxonomy to OTU representative sequences
- Align OTU representative sequences
- Filter the Alignment
- Build a Phylogenetic Tree
- Make the OTU table
Each step here gets us to the OTU table, which can be used for downstream analysis including chart creation and PICRUSt analysis.
Lastly we remove low confidence OTUs that are represented by less than 0.1% of the sequencing reads. This number is chosen because it is the predicted sample bleed-through on the illumina MiSeq machine, making it a cutoff for removal of low confidence OTUs.
Using QIIME we can then create alpha- and beta-diversity charts for the samples, and pie/bar charts for taxonomy for each phylogenetic group (class, family, genus, etc). Examples of these can be seen below, and these charts from our fecal sample analysis can be found here
PICRUSt:
After taxonomic assignment, a tool called Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) was used to determine functionality of OTU’s within the microbiomes of each animal.
PICRUSt works by predicting the gene families contained inside a bacterial or archaeal metagenome identified using 16S bacteria and archaeal sequencing. It uses QIIME taxonomy assignments that were done using a closed OTU search. In PICRUSt’s case, the OTU picking is done against the GreenGene database, and that database is then used to predict gene familes from the metagenomes associated with that database.
The PICRUSt algorithm works in two steps. The first is gene content predicition, which involves predicting the gene content of ancestral organisms from their living relatives. This step is extended to organisms that are part of the living relatives group, but are still unknown because they have not been studied. This is done by estimating the unknown living relative’s gene content through their 16S sequence similarity to studied and sequenced organisms. The PICRUSt algorithm therefore precomputes the gene content of all known organisms in a reference database. Because this is a particularly daunting task for even a strong computer, this step is only done once and the output is provided for the next step. This is why OTU picking from QIIME taxonomy assignments must be done to a closed reference database. Otherwise, matches will not correspond to the precomputed reference database and PICRUSt will be unable to complete the second step.
The second step is metagenome inference. PICRUSt infers the gene content of the organisms present in the OTU picking output by comparing them to their sequenced evolutionary relatives. This metagenome inference is done by comparing the 16S rRNA gene sequence to known sequences and makes conclusions on what each sequence’s evolutionary relative is. PICRUSt can then output a predicted metagenome for each 16S sequence picked in the closed OTU picking step and provide inferred gene content for the entire microbiome.
This will be a description of the second part of our project (isolation of cellulose and tree sap degrading bacteria from fecal samples)
This will be a description of the third part of out project (metagenomic library based on cells that degrade cellulose and sap)