March
Week 1 (14th)
Started literature research towards cry proteins, Varroa destructor, and Machine Learning.
Compared acari species to find bt toxins that might serve as a backup for killing the varroa, Found a mite which is related to V destructor and has a bt toxin targeting it.
Week 2 (21st)
WEKA tutorial viewed, checked out bt_toxin scanner ,checked out rapidminer and other tools which might be of use.
Found varroa genome http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-11-602
Genbank of varroa genome was completely unannotated. Will now look at it with Genemark -> faa file and maybe blast all of it, get best match?
Also found the cytostudio tool, which might be interesting.
April
Week 3 (4th)
Continued literature research towards cry proteins, Varroa destructor, the cytostudio tool and Machine Learning. Found cello, a DNA programming language. And received some information about RDF / blazegraph
Week 4 (11th)
Trying out cello, but it keeps crashing, 2 attempts at putting in marijn’s killswitch system made me give up. Continued learning how RDF works.
Downloaded weka and bioweka.
Currently running Augustus on varroa genome (using a mosquito “aedes” model) to predict genes, looking for something we could target.
Week 5 (18th)
Varroa genome has 21190 gene id’s, created a fasta file of the predicted protein sequences. Examined RDF, (Bio)Weka, the Varroa genome and Bt Toxin Scanner some more.
Went to BioSB!
Found a nice cry toxin list which has over 800 sequences. For Bt Toxin Scanner they only seemed to use 606.
When searching for “ endotoxin OR crystal AND Bacillus thuringiensis[Organism] “ in NCBI protein, you get 2242 hits, so I can't really get my data from there, its too much to manually curate. Insstead used getCry.sh to get as many proteins as possible.
Tools in Research Proposal:
- IDBA_UD: in the SAPP pipeline?
- GENEMARK: http://exon.gatech.edu/GeneMark/genemarks.cgi
- GENEMARK(euk): ~/tools
- PRODIGAL: in the SAPP pipeline?
- WEKA: ~/tools
- RAPIDMINER: not going to use this.
- R/PYTHON: everywhere on server.
- BtToxin_Scanner: ~/tools
- Interproscan: /home/thom/interproscan-5.15-54.0/ (Bash script gives a huge error)
- SMART: http://smart.embl-heidelberg.de/
- CD-SEARCH: http://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi
- MEME/MAST: http://meme-suite.org/ || /scratch2/maarten/motif_prediction/meme_4.10.0/
- EXTREME:https://github.com/uci-cbcl/EXTREME
- Bio-Prodict 3DM: . https://www.bio-prodict.nl/
- YASARA: www.yasara.org || installed on my laptop
- Core model: http://systemsbiology.ucsd.edu/Downloads/EcoliCore
- Other models: https://www.ebi.ac.uk/biomodels-main/
Other Tools:
- BLASTALL: /usr/bin/blastall
- HMMSCAN: /usr/bin/hmmscan
- Clustalo: /usr/local/bin/clustalo
- Semantic web: http://ssb4.wurnet.nl:9999/blazegraph/#query
Week 6 (25th)
Checked out galaxy, from SSB which contains: Prodigal, Assembly tools of all kinds, and gene annotation tools, now to check out the data!
Worked on genemark, produced files with two '>' fasta markers in the header. Regular expressions to the rescue!
Figured out the SAPP pipeline pretty much, got genemark to work, making a clustalo of 6 cry proteins as a test for the pipeline.