Line 84: | Line 84: | ||
<p style="font-size:16px;">SVM is a machine learning method, by transforming data from dataspace into a hyper space, we can find a hyperplane that can separate these data with different labels. The SVM is trying to minimize the transformation matrix in some way. RandomForest is an ensembled decision forest. With building several decision trees, we are able to find out the best prediction that most decision tree agree.</p> | <p style="font-size:16px;">SVM is a machine learning method, by transforming data from dataspace into a hyper space, we can find a hyperplane that can separate these data with different labels. The SVM is trying to minimize the transformation matrix in some way. RandomForest is an ensembled decision forest. With building several decision trees, we are able to find out the best prediction that most decision tree agree.</p> | ||
<p style="font-size:16px;">In processing and testing data, member of NYMU discover that SVM made incredibly great prediction(~100%) when the datasets are big enough, while random forest got average score of 80%. However, in some cases, when randomforest model reached 60~70% of accuracy score, SVM got 50% or worse. We thus decide to build a simple ensembled prediction model, with different parameters and training datasets. And choose the pest population size that most classifiers agree. </p><br /> | <p style="font-size:16px;">In processing and testing data, member of NYMU discover that SVM made incredibly great prediction(~100%) when the datasets are big enough, while random forest got average score of 80%. However, in some cases, when randomforest model reached 60~70% of accuracy score, SVM got 50% or worse. We thus decide to build a simple ensembled prediction model, with different parameters and training datasets. And choose the pest population size that most classifiers agree. </p><br /> | ||
− | + | <p style="font-size:20px;text-align:center;"><b>Additional reading about machine learning techiniques</b></p> | |
<ul> | <ul> | ||
<li style="list-style-type:square;"><p style="font-size:16px; white-space:pre-wrap;">SVM | <li style="list-style-type:square;"><p style="font-size:16px; white-space:pre-wrap;">SVM |
Revision as of 11:20, 19 October 2016
We have a hardware, a software for Ios-i-GEM, yet …
Besides the fungal killing switch and the functional prototype that help reduce concerns over GMO, we wonder what else in iGEM we can do as social practice to really engage in growers’ life and help them diminish threats posed by those pests. So far in our project, the entomogenous fungus provides a biological, no-harm attempt to eradicate the pests, which is one of the most important components of our idea. The prototype makes applying these genetically-engineered fungi practical and perhaps better. Now that we have a software and a hardware, what can we do more for the growers?
The answer came to the app, a well-design, thoughtful and realistic app. Growers, as well as government officials can simply check the “Taiwan Pest Prediction web”, where we provide 4 common pests in Taiwan with time scale from 1 day to 3 days. We offer the predicted pest scale from 0~16, 16~64, and above 64, 4 ranges in total. We convert prediction question into classification question, by building numerous classifiers and perform voting, we can get the prediction that most classifiers agree to. More important, we put all the code on github as open source, everyone and from every country can take advantage of our efforts to establish a better and powerful prediction program.
Collaboration
This app is the outcome of our collaboration with NCTU. We first developed our idea of construct such a web app for social practice. In August, when we have done the web crawling part, we participated in Asia-Pacific iGEM conference hosted by NCKU, and met our friends from NCKU. It was incredible to meet friends with similar ideas and, most importantly, we decided to collaborate at that time. From then, we had several face-to-face talks in FB Messenger and frequent calls.
The attribution is listed as following:
NCTU
Provide us with expertise in FTP, web crawling, and the concept in pipeline
Streamline code, making it more readable and understandable.
Provide the idea of open source, and we did put our code on github.
NYMU
First come up with this idea
Write program for web-crawling, date processing, machine learning and FTP uploading.
Establish FTP host and write UI website.
How does it work?
Step 1: Update climatological data Connect to Agriculture weather website. Crawl and store the newest climatological data. Step 2: Data process and making prediction Grasp necessary information from html, and build up several SVM and RandomForest classifiers for prediction. Merge the outcome with the web frame. Step 3: upload to FTP Upload the html files to FTP. ----icon credit: Freepik, Wissawa Khamsriwath, Madebyoliver, Gregor Cresnar
Data source:
Hypothesis:
According to reports from Taiwan Agricultural Research Institute, the weather condition can directly or indirectly influence the maturation of pest. Take oriental fruit fly for example, days of raining can lead to the maturation. Low temperature, in contrast, resulting in small population size of oriental fruit fly. In building the model, we presumed that there is a relationship between weather and pest group size, such that there is a transformational matrix that transforms the weather information approximately to pest group size.
Feature selection: We choose average day temperature, highest day temperature, lowest day temperature, rainfall as feature.
Model selection:
We are using two famous and frequently used model in building prediction model, SVM and RandomForest.
Support Vector Machine
A Gentle Introduction to Support Vector Machines in Biomedicine, Alexander Statnikov*, Douglas Hardin# , Isabelle Guyon†, Constantin F. Aliferis[1]
Random Forest
ICCV TUTORIAL, BOOSTING and Random Forest for Visual Recognition, Tae-Kyun Kim , Jamie Shotton , Björn Stenger[2]
SVM is a machine learning method, by transforming data from dataspace into a hyper space, we can find a hyperplane that can separate these data with different labels. The SVM is trying to minimize the transformation matrix in some way. RandomForest is an ensembled decision forest. With building several decision trees, we are able to find out the best prediction that most decision tree agree.
In processing and testing data, member of NYMU discover that SVM made incredibly great prediction(~100%) when the datasets are big enough, while random forest got average score of 80%. However, in some cases, when randomforest model reached 60~70% of accuracy score, SVM got 50% or worse. We thus decide to build a simple ensembled prediction model, with different parameters and training datasets. And choose the pest population size that most classifiers agree.
Additional reading about machine learning techiniques
SVM A clear intro SVM without tears: page 19 is incredibly and clear
RandomForest To realize RandomForest, it is recommended to understand decision tree first: https://en.wikipedia.org/wiki/Decision_tree (當作上面decision tree的超連結) Here is an intro to RandomForest In Wikipedia https://en.wikipedia.org/wiki/Random_forest a pdf deliberates on RandomForest https://home.zhaw.ch/~dueo/bbs/files/random-forest-intro-presented.pdf