Team:Dalhousie Halifax NS/Description

Dalhousie iGEM 2016

Cellulolytic enzymes have become a very interesting field of research for the potential production of microbial biofuels. Biofuels, such as ethanol, are fuels that microorganisms could create from simple feedstocks like cellulose. Cellulose is generally a difficult carbon source to degrade and therefore provides less available carbon for use as energy. However, using organisms, or proteins from organisms, that are capable of degrading cellulose we can make available many feedstocks for biofuel production. Producing biofuel from things like leftover vegetables, lumber industry by-products and waste, seaweed and other easily obtainable sources could provide an answer to the ever-growing problem of energy production.

The Softwood Lumber industry is particularly pertinent for Nova Scotia. In 2014, Nova Scotia exported roughly 668 Million dollars in softwood lumber products including pulp and paper products, primary lumber products and fabricated wood materials. The softwood lumber industry in Nova Scotia also harvested roughly 29,000 hectares of forest for the creation of softwood products. This industry produces many useful products as well as many useless by-products. Imagine taking these useless by-products and turning them into a sustainable form of energy like an ethanol biofuel through a simple biological reactor using E. coli and enzymes harvested from cellulose-degrading bacteria. This is what we are aiming to do with this year's project.

Our project will characterize the microbiomes from the North American Porcupines found in Nova Scotia forests and attempt to "mine" the microbiome for useful enzymes. A microbiome is the collection of microbes found in and on larger organisms. Humans have microbiomes on their skin, in their noses, in their ears and in many other places as well. The most important human microbiome, however, is the gut microbiome. It helps us digest many otherwise undigestable foods that we consume and provides many nutritional byproducts for us. The research in this area is new, but we are learning more and more of the importance of a healthy gut microbiome. Interestingly, the North American Porcupine eats mostly bark and wood from trees found in Nova Scotia. Since the Porcupine’s own cells cannot digest plant matter, this woodland creature, then, must contain gut bacteria that aid in digestion of all the cellulose it consumes. Thus, the gut microbiome of the North American porcupine seems like a logical starting point for our search of a microbiome to mine.

Our project is designed in three stages. First: we hope to isolate and characterize the microbiomes of many animals found at the Shubenacadie Wildlife Park. Second: we hope to isolate and characterize particular cellulose-degrading bacteria using a cellulose agar medium. Third: we aim to construct a metagenomic library of the microbial DNA found in the porcupine microbiome as a way to isolate and characterize genes for useful cellulose degrading enzymes that can be expressed in E. coli. Each of these experimental stages are broken down into smaller pieces below. We invite you to check out our entire process and walk with us as we explore the vast possibilities of cellulose degraders, microbiomes and biofuel production.

Shubenacadie Wildlife Park Microbiome Survey

The function of this microbiome survey is to get a handle on what kinds of organisms are present in mammal microbiomes to see if the use of microbiomes as a "mine" for biological engineering purposes is feasible. This part of our project is designed to determine if a metagenomic library constructed from the DNA extracted from gut microbiomes is indeed useful for biological engineering: in our investigation, we are specifically seeking genes that encode enzymes capable of degrading softwood lumber waste and turning it into useful compounds like bio-plastics and bio-fuels. Through the use of 16S taxonomy data and through PICRUSt analysis we intend to show that these microbiomes are a gold mine for biological engineering.

Shubenacadie Wildlife Park:

The Shubenacadie Wildlife Park is home to 29 mammals and 35 birds on a 40 hectare piece of land. These animals were rescued as injured or sick from the wild, and were rehabilitated. Some animals will be unable to return to the wild and live out the rest of their lives at the Park. These animals live in enclosures that simulate environments that are comfortable to them. For example, the rescued otters have a large pool in which they can swim. More about the Wildlife Park can be found here.

The wildlife park has been an invaluable resource. The animals are fed a supplemented diet, which consists mainly of natural diets with added food like fruits/veggies for the herbivores and protein supplements for the carnivores. Because they eat a supplemented diet, and diet does affect microbiome content, there is potential error associated with the use of samples obtained from the Park. However, we considered this error to be minimized as the mammals eat many of the same things they would in nature. The Park was also the safest way for us to obtain fecal samples, so associated error was acknowledged as part of this process.

DNA Extraction:

We obtained fecal samples from 21 mammals at the Shubenacadie Wildlife Park. These fecal samples contain millions of each animal’s gut microbes, which make up the microbiomes of their gut. The DNA extraction step allows us to identify these bacteria through the bacterial DNA found in each fecal sample. DNA extraction was done using MoBio’s PowerFECAL kit which uses both alkaline lysis and mechanical lysis. Alkaline lysis works through a detergent, which disrupts the membranes of the bacterial cells. Mechanical lysis works through small ceramic beads shaken in the tube at high speeds to mechanically break up bacterial membranes and release the bacterial cell contents. The broken bacterial cell debris is discarded and a supernate is passed through a column, where DNA binds but all other cell material flows through. The columns were washed with ethanol and DNA is eluted using a buffer provided by MoBio. We are left with clean, pure genomic DNA from all the cells in the fecal sample.

Polymerase Chain Reaction:

To prepare our genomic DNA sample for illumina sequencing, we first had to use polymerase chain reaction (PCR) to create many copies of a specific region of the genomes. The specific region we require for microbial identification is the 16S rRNA gene. This gene is used widely as a molecular phylogeny taxonomic identifier: analysis of the obtained data will provide us with a good idea of what bacteria are present in each sample.

PCR works by using small oligonucleotides that are called primers, and a heat-stable DNA polymerase. These oligonucleotides anneal to a template and provide a free 5’-OH group, essentially priming the DNA synthesis reaction; the heat-stable polymerase then amplifies DNA from that template strand. A machine called a thermocycler works to cycle the temperature at which samples are incubated. The cycle has three steps: denature, anneal and elongate. The denaturation step melts the double-stranded DNA molecule into single-stranded molecules. The annealing step allows the oligonucleotide primers to bind to the single-stranded molecules at their complementary sequence. The elongate step allows the heat-stable polymerase to extend the DNA sequence. After many cycles, we have exponential amplification of the segment of DNA that is flanked by the oligonucleotide primers. For illumina sequencing, the oligonucleotide primers also contain an adapter region that is added onto the terminus of the amplified 16S rRNA gene DNA, allowing the DNA to be compatible with DNA linked to the illumina sequencer flowcell. The adapter addition is important because it introduces specificity in that only DNA containing the adapter will be sequenced.

Illumina Sequencing:

The environmental DNA extraction provides us with DNA from the organisms in the microbiomes of each of the Shubenacadie Wildlife Park mammals. Analysis of the genomic DNA mixture provides us with biological information such as the identities of the organisms present and the major functions of the microbiomes. This information is stored within the DNA sequence itself. To get access to this sequence information we used Illumina MiSeq.

The illumina sequencer works on a principle called sequence-by-synthesis. Using PCR we amplified all of the 16S regions found in our fecal sample DNA extractions When the PCR-amplified DNA is placed into the flow cell, the adapter region added by PCR binds to complementary oligonucleotides attached to the surface of the flow cell. These oligonucleotides serve as a free 5’-OH group to “prime” the sequencing-by-synthesis reaction. The bound PCR products are sequenced through the addition of fluorescently labelled deoxy-ribonucleotides that glow when they are incorporated into the DNA molecule. Each of these nucleotides glow a different colour, so serve to identify the nucleotide being incorporated. The illumina sequencer records each of these flashes of coloured light and is able to interpret this information to provide a DNA sequence. This happens hundreds of thousands of times in a massively parallel process to sequence all of the amplified DNA from our PCR amplification step. The sequence information can then be used to determine what bacteria are found in the sample.

Illumina sequencing provides a massive amount of data that must then be sifted through with the aid of computer programs. For this task we use QIIME analysis tools with the help of the Integrated Microbiome Resource here at Dalhousie University and Dr. Morgan Langille’s Microbiome Helper tool.

QIIME Analysis:

To fully understand the data that the illumina sequencer gives us, we must use a data analysis pipeline. A pipeline is simply a set of tools used in the workflow of data analysis to provide us with a clear picture of the data we have collected. The microbiome helper tool, developed by Dr. Morgan Langille, contains the entire Quantitative Insights Into Microbial Ecology (QIIME, pronounced “chime”) set of tools which, collectively, are an open-source pipeline that wraps many other tools necessary for the analysis of illumina sequenced 16S rRNA gene data. This pipeline makes it possible to analyze 16S data efficiently and with accuracy to determine the taxonomy of the microbial community represented by the 16S sequences.

contain not only sequences that are important to us, but also chimeric sequences, bad quality sequences, very short sequences and lots of other “junk data” that could interfere with our taxonomic assessment down the pipeline. We must use the QIIME tools to clean up the raw data and retain what is useful.

The first step is to align forward and reverse DNA sequence reads, which come out of the sequencer as single-strand reads, and compile them into double-stranded reads. QIIME uses PEAR (Paired-End reAd mergeR) to do this.

The next step is to use a statistics tool called fastQC, which provides us with information of the read quality and length from the paired reads created in the PEAR step. This step allows the filtering of reads to remove short and bad quality reads from our samples. fastQC also has scripts for this process.

Following filtering of the reads, chimeric reads must be removed. A chimeric read is a piece of DNA amplified during the PCR step that is a combination of two separate molecules that became joined as a byproduct of PCR. Because the 16S region of the bacterial and archaeal genomes are conserved, they often contain some level of sequence similarity. This occasionally causes one single-stranded 16S DNA molecule to base-pair to, and prime, the PCR amplification of another leading to a new, chimeric 16S gene that is not indicative of any bacterial or archaeal species.

These reads must be removed to make sure that our data are accurate. This is done by VSEARCH, which stands for vectorized search. VSEARCH attempts to align the sequences to genomes. If half of the sequence matches a region from one genome, while the second half of the sequence matches a region from the genome of a different organism, the program knows that the sequence is chimeric and not a biologically distinct sequence.

Next, the non-chimeric, paired and filtered reads are matched against a database. The database matching step picks operational taxonomic units (OTUs), which are represented by distinct 16S gene sequences, and compares them to a database to be taxonomically identified. There are three primary ways to pick OTUs. De-novo OTU picking matches 16S sequences together into OTUs without the use of any reference database; Open OTU picking matches 16S sequences to an open-sourced database like BLAST; and Closed OTU picking matches 16S sequences to a closed database like GreenGenes. The method we used is de-novo OTU picking, which does not use a reference database for OTU picking, and then compares the OTU’s to a database for identification. De-novo OTU picking is much slower than using a reference database to determine OTUs, but it provides a more wholly inclusive dataset. It is more inclusive because any 16S sequence not matched to a reference database in open and closed OTU picking are thrown out. Through de-novo OTU picking, the program does not throw out 16S data. The taxonomic assignment occurs in 7 steps, which are all wrapped by the pick_denovo_otus script provided by QIIME:

  1. Pick OTUs based on sequence similarity within the reads (de-novo picking)
  2. Pick a representative sequence for each OTU
  3. Assign taxonomy to OTU representative sequences
  4. Align OTU representative sequences
  5. Filter the Alignment
  6. Build a Phylogenetic Tree
  7. Make the OTU table

Each step here gets us to the OTU table, which can be used for downstream analysis including chart creation and PICRUSt analysis.

Lastly we remove low confidence OTUs that are represented by less than 0.1% of the sequencing reads. This number is chosen because it is the predicted sample bleed-through on the illumina MiSeq machine, making it a cutoff for removal of low confidence OTUs.

Using QIIME we can then create alpha- and beta-diversity charts for the samples, and pie/bar charts for taxonomy for each phylogenetic group (class, family, genus, etc). Examples of these can be seen below, and these charts from our fecal sample analysis can be found here


After taxonomic assignment, a tool called Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) was used to determine functionality of OTU’s within the microbiome of each animal.

PICRUSt works by predicting the gene families contained inside a bacterial or archaeal metagenome identified using 16S bacteria or archaeal sequencing. It uses QIIME taxonomy assignments that were done using a closed OTU search. In the case of PICRUSt, the OTU picking is done against the GreenGene database, and that database is then used to predict gene familes from the metagenomes associated with that database.

The PICRUSt algorithm works in two steps. The first is gene content prediction, which involves predicting the gene content of ancestral organisms from their living relatives. This step is extended to organisms that are part of the living relatives group, but are still unknown because they have not been studied. This is done by estimating the unknown living relatives' gene content through their 16S sequence similarity to studied and sequenced organisms. The PICRUSt algorithm therefore precomputes the gene content of all known organisms in a reference database. Because this is a particularly daunting task for even a strong computer, this step is only done once and the output is provided for the next step. This is why OTU picking from QIIME taxonomy assignments must be done to a closed reference database. Otherwise, matches will not correspond to the precomputed reference database and PICRUSt will be unable to complete the second step.

The second step is metagenome inference. PICRUSt infers the gene content of the organisms present in the OTU picking output by comparing them to their sequenced evolutionary relatives. This metagenome inference is done by comparing the 16S rRNA gene sequence to known sequences and makes conclusions on the identity of each sequence’s evolutionary relative. PICRUSt can then output a predicted metagenome for each 16S sequence picked in the closed OTU picking step and provide inferred gene content for the entire microbiome.

Metagenomic Library and Isolation

The fecal samples contained ton's of bacteria, and we wanted a quick and easy way to ensure there were some cellulose degrading bacteria. We plated these samples of cellulose only media to see if this was true. The metagenomic library is our future goal for this project, which will enable us to screen for useful enzymes in biofuel production from our fecal sample bacteria. We did not get to the metagenomic library this year, but a quick description of it is given here.


Cellulose-degrading bacteria were isolated on cellulose medium (see protocol page for medium recipe) which contained ground Whatman paper acting as the cellulose source. The medium was used by a research group in India who were conducting very similar experiments, and they had had great success in isolating cellulose-degrading bacteria. After receiving fecal samples from Shubenacadie Wildlife Park, the team plated the samples on the cellulose medium. For the initial round of porcupine fecal samples, we made duplicates of every plate so that we could incubate in both aerobic and anaerobic conditions. Following a 24-hour incubation, we observed various colony morphologies. Bacteria displaying three prominent colony morphologies were chosen to pursue more closely. These colonies appeared as white, yellow, and bulls-eye colonies on cellulose medium. These colonies were then streak purified on LB medium.

16S rRNA Gene Sequencing

We managed to isolate bacteria capable of growing on the cellulose medium. This was done as a quick and dirty identification of some of the cellulose degrading bacteria as a starting place for our project. A discrete portion of the 16S rRNA gene from each of the three microbes was amplified using pcr. These amplicons were sent away for DNA sequence analysis. The resulting sequences were used to identify the bacterial colonies to the Genus level.

The 16S rRNA gene, as mentioned in the Microbiome Survey section and here in Isolation, is a perfect gene for taxonomy identification in bacteria. This is for three reasons.

  • DNA provides better taxonomic identification than traditional bacteriological assays like Gram stains.
  • It is conserved only in prokaryotes, thus excluding eukaryotes from our identification procedures
  • It is a short sequence, which makes it ideal for cheap Sanger DNA equencing and for high-throughput technologies like those used in the Microbiome Survey section

The 16S rRNA gene contains nine variable regions, some of which are flanked with constant regions that can be used to amplify smaller fragments that contain the variable regions. These variable regions help us assign taxonomy. For 16S sequencing using high-throughput assays, it generally requires a smaller piece of DNA, so these constant regions can be useful when the input size is smaller than the entire gene.

Metagenomic Library

When we first begun our project we had every intention of constructing a metagenomic library using the porcupine fecal sample as our environmental isolate. However, as other areas of the research began to erupt in exciting results, the library got put on the back burner. That being said, we did begin the construction of the library with the help from the Waterloo iGEM team who provided the pJC8 cosmid, to be used as the vector for our library construction. Waterloo iGEM also sent us a protocol to help with our environmental metagenomic library construction. Although we didn’t get too far with the library, we did make some changes to the protocol to make the construction procedure more efficient.

We were originally interested in developing the metagenomic library as it would provide the ability to assess and exploit the whole microbial genomic composition found in our environmental sample. Metagenomics, in comparison to phytogenic surveys, does more than look at the diversity of one gene (ex. 16s rRNA gene). Metagenomics can provide insight into enzymatic pathways, evolutionary profiles, and microbial function.

Metagenomics provides the chance to search through the incredible biodiversity found in the environment without having to conform the sample to fit within standard laboratory techniques. When we finish constructing the library, we will use it to search for enzymatic and chemical pathways that are involved in cellulose-degradation.

How to make a metagenomic library (in short):

  1. Isolation of the DNA. It is important that the sample collected represents the entirety of the environment being examined. We used chloroform extraction to isolate large pieces of DNA from the porcupine fecal sample.
  2. Manipulation of the DNA. To be used in further downstream processes, the large environmental DNA pieces were cut by restriction enzymes which act at specific sites called restriction sites
  3. Incorporation into a vector. A vector is a DNA molecule that can carry foreign genetic material into another cell. It can then use the host cell macromolecular synthesis machinery to produce proteins encoded by the foreign genetic material. Vectors usually contain a selectable marker which can confer antibiotic resistance. The incorporation of the vector into a model organism provides that organism with selectable growth advantage, which can help to identify organisms containing the vector.
  4. Introduction of vector (and foreign environmental DNA) into a model organism. The vector is often introduced into the model organism, Escherichia coli, through a process called transformation. Transformation can occur in a few ways such as chemical, electrical, or biological methods. This allows for the insertion of DNA into the cell followed by its maintenance and the stable production of plasmid-encoded proteins. The transformed cells are then grown on selective medium and only cells that contain the vector will survive and form colonies (because of the antibiotic resistance segment in the vector). Each colony that grows can be used to study a specific fragment of DNA from the environmental sample.
  5. Examination of the DNA in the metagenomic library. The DNA inserted into the model organism will contain genes which will code for enzymes. We can design assay's that will test for particular enzyme function within the library and find cells that contain the DNA of interest. Using these assays, we are able to identify and characterize enzymes we are interested in.

Chemical Analysis

The chemical analysis part of our project allowed us to ensure that using E. coli in the softwood lumber industry would be feasible. It may not be feasible because trees make many antimicrobial compounds that could inhibit E. coli's ability to grow in these conditions. Our chemical analysis section allowed us to look into this and ensure we won't have any more problems down the road.

Why do this?

Bacteria have already established their role in industry, and their applicability in the industrial sector has been growing rapidly. Our work is definitely inspired with this in mind, but if we want to generate a strain that operates anywhere near pine tree resin, there is a caveat which must first be investigated.

Pine tree resin contains (amongst many other things) a mixture of terpenes. Terpenes are a class of compounds which are biosynthesized out of isoprene units. The terpenes often found in pine tree resin are known to exhibit anti-microbial properties -- this could be an issue if we try to utilize bacteria to break down softwood lumber, for example.

We opted to first investigate this issue by simply growing our bacteria (E. coli) in the presence of tree resin. We did not see any inhibition, but we didn't think we were safe quite yet. We used steam distillation (and conventional distillation too) to isolate terpenes from pine tree resin, and analyzed the distillate using GC/MS. We also performed experiments to determine the minimum inhibitory concentration of the distillate (which was essentially home-made turpentine), as well as other terpenes. We carried out both solid and liquid media inhibition studies and can confidently claim that our bacteria, E. coli, would be able to survive terpene-heavy conditions.


The terpenes in pine tree resin have high boiling points (150C - 200C), and simply distilling them would cause decomposition of the terpenes into other compounds. This is problematic because we'd be left with the following question: are we testing the inhibition of pine tree resin terpenes, or the products of terpene decomposition?

Although we DID use conventional distillation and obtained a terpenic extract, we also applied a "softer" distillation method to avoid potential decomposition. Terpenes are insoluble in water, and a mixture of the two (with far more water than terpenes) will boil at a temperature slightly lower than 100C. This can be rationalized using Dalton's law of partial pressures, but we will simply provide a crude explanation and say "the gaseous water molecules are pulling out with them gaseous molecules of terpenes as well". After collecting the distillate from this procedure (which is actually a mixture of clear water with a thin film of terpenic extract floating on top), the organic layer is extracted out and used for analysis!


Abubucker, S., Segata, N., Goll J., Schubert, A. M., Izard, J., Cantarel, B. L., et al. (2012). HUMAnN: The HMP unified metabolic analysis network. PLoS Comput. Biol., 8(6), e1002358.

Bolger, A. M., Lohse, M. and Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114-2120.

Buchfink, B., Xie, C. and Huson, D., H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature Methods, 12, 59-60.

Comeau A. M., Li W. K. W., Tremblay J-É., Carmack E.C., Lovejoy C. (2011). Arctic Ocean Microbial Community Structure before and after the 2007 Record Sea Ice Minimum. PLoS ONE, 6(11), e27492.

Kanehisa, M. and Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res., 28(1), 27-30.

Kopylova, E., Noé L. and Touzet H. (2012). SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics, 28(24), 3211-3217.

Langille, M. G.I.*; Zaneveld, J.*; Caporaso, J. G.; McDonald, D.; Knights, D.; a Reyes, J.; Clemente, J. C.; Burkepile, D. E.; Vega Thurber, R. L.; Knight, R.; Beiko, R. G.; and Huttenhower, C. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nature Biotechnology, 1-10. 8 2013.

Lozupone, C., & Knight, R. (2005). UniFrac: a New Phylogenetic Method for Comparing Microbial Communities. Appl. and Env. Microbiol., 71(12), 8228-8235.

McDonald, D., Price, M. N., Goodrich, J., Nawrocki, E. P., DeSantis, T. Z., Probst, A., et al. (2012). An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. The ISME Journal, 6(3), 610-618.

Parks, D. H., Tyson, G. W., Hugenholtz, P., Beiko, R. G. (2014). STAMP: statistical analysis of taxonomic and functional profiles. Bioinformatics, 30(21), 3123-3124.

Rideout, J. R., He, Y., Navas-Molina, J. A., Walters, W. A., Ursell, L. K., Gibbons, S. M., et al. (2014). Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences. PeerJ, 2, e545.

Truong, D. T., Franzosa, E. A., Tickle, T. L., Scholz, M., Weingart, G., Pasolli, E., et al. (2015). MetaPhlAn2 for enhanced metagenomics taxonomic profiling. Nature Methods, 12, 902-903.

Zhang, J., Kobert, K., Flouri, T., & Stamatakis, A. (2014). PEAR: A fast and accurate illumina paired-end reAd mergeR. Bioinformatics, 30(5), 614-620.