Team:Wageningen UR/Notebook/ToxinScanner

Wageningen UR iGEM 2016

 

March

Week 1 (14th)

Started literature research towards cry proteins, Varroa destructor, and Machine Learning.
Compared acari species to find bt toxins that might serve as a backup for killing the varroa, Found a mite which is related to V destructor and has a bt toxin targeting it.

Week 2 (21st)

WEKA tutorial viewed, checked out bt_toxin scanner ,checked out rapidminer and other tools which might be of use.
Found varroa genome http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-11-602
Genbank of varroa genome was completely unannotated. Will now look at it with Genemark -> faa file and maybe blast all of it, get best match?
Also found the cytostudio tool, which might be interesting.

April

Week 3 (4th)

Continued literature research towards cry proteins, Varroa destructor, the cytostudio tool and Machine Learning. Found cello, a DNA programming language. And received some information about RDF / blazegraph

Week 4 (11th)

Trying out cello, but it keeps crashing, 2 attempts at putting in marijn’s killswitch system made me give up. Continued learning how RDF works.
Downloaded weka and bioweka.
Currently running Augustus on varroa genome (using a mosquito “aedes” model) to predict genes, looking for something we could target.

Week 5 (18th)

Varroa genome has 21190 gene id’s, created a fasta file of the predicted protein sequences. Examined RDF, (Bio)Weka, the Varroa genome and Bt Toxin Scanner some more.
Went to BioSB!
Found a nice cry toxin list which has over 800 sequences. For Bt Toxin Scanner they only seemed to use 606.
When searching for “ endotoxin OR crystal AND Bacillus thuringiensis[Organism] “ in NCBI protein, you get 2242 hits, so I can't really get my data from there, its too much to manually curate. Insstead used getCry.sh to get as many proteins as possible.

Tools in Research Proposal:

Other Tools:

Week 6 (25th)

Checked out galaxy, from SSB which contains: Prodigal, Assembly tools of all kinds, and gene annotation tools, now to check out the data!
Worked on genemark, produced files with two '>' fasta markers in the header. Regular expressions to the rescue!
Figured out the SAPP pipeline pretty much, got genemark to work, making a clustalo of 6 cry proteins as a test for the pipeline.