March

Week 1 (14th)

Started literature research towards cry proteins, Varroa destructor, and Machine Learning.
Compared acari species to find bt toxins that might serve as a backup for killing the varroa, Found a mite which is related to V destructor and has a bt toxin targeting it.

Week 2 (21st)

WEKA tutorial viewed, checked out bt_toxin scanner ,checked out rapidminer and other tools which might be of use.
Found varroa genome http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-11-602
Genbank of varroa genome was completely unannotated. Will now look at it with Genemark -> faa file and maybe blast all of it, get best match?
Also found the cytostudio tool, which might be interesting.

April

Week 3 (4th)

Continued literature research towards cry proteins, Varroa destructor, the cytostudio tool and Machine Learning. Found cello, a DNA programming language. And received some information about RDF / blazegraph

Week 4 (11th)

Trying out cello, but it keeps crashing, 2 attempts at putting in marijn’s killswitch system made me give up. Continued learning how RDF works.
Downloaded weka and bioweka.
Currently running Augustus on varroa genome (using a mosquito “aedes” model) to predict genes, looking for something we could target.

Week 5 (18th)

Varroa genome has 21190 gene id’s, created a fasta file of the predicted protein sequences. Examined RDF, (Bio)Weka, the Varroa genome and Bt Toxin Scanner some more.
Went to BioSB!
Found a nice cry toxin list which has over 800 sequences. For Bt Toxin Scanner they only seemed to use 606.
When searching for “ endotoxin OR crystal AND Bacillus thuringiensis[Organism] “ in NCBI protein, you get 2242 hits, so I can't really get my data from there, its too much to manually curate. Insstead used getCry.sh to get as many proteins as possible.

Tools in Research Proposal:

IDBA_UD: in the SAPP pipeline?
GENEMARK: http://exon.gatech.edu/GeneMark/genemarks.cgi
GENEMARK(euk): ~/tools
PRODIGAL: in the SAPP pipeline?
WEKA: ~/tools
RAPIDMINER: not going to use this.
R/PYTHON: everywhere on server.
BtToxin_Scanner: ~/tools
Interproscan: /home/thom/interproscan-5.15-54.0/ (Bash script gives a huge error)
SMART: http://smart.embl-heidelberg.de/
CD-SEARCH: http://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi
MEME/MAST: http://meme-suite.org/ || /scratch2/maarten/motif_prediction/meme_4.10.0/
EXTREME:https://github.com/uci-cbcl/EXTREME
Bio-Prodict 3DM: . https://www.bio-prodict.nl/
YASARA: www.yasara.org || installed on my laptop
Core model: http://systemsbiology.ucsd.edu/Downloads/EcoliCore
Other models: https://www.ebi.ac.uk/biomodels-main/

Other Tools:

BLASTALL: /usr/bin/blastall
HMMSCAN: /usr/bin/hmmscan
Clustalo: /usr/local/bin/clustalo
Semantic web: http://ssb4.wurnet.nl:9999/blazegraph/#query

Week 6 (25th)

Checked out galaxy, from SSB which contains: Prodigal, Assembly tools of all kinds, and gene annotation tools, now to check out the data!
Worked on genemark, produced files with two '>' fasta markers in the header. Regular expressions to the rescue!
Figured out the SAPP pipeline pretty much, got genemark to work, making a clustalo of 6 cry proteins as a test for the pipeline.

Team:Wageningen UR/Notebook/ToxinScanner

Software