Team:NCTU Formosa/Model

Software—Toxin selection

I. Purpose

To prove the concept of Pantide, we wanted to select three existing distinct spider toxin peptides with probable oral toxicity against the testee-Spodoptera litura(Tobacco cutworms). For the actual application of Pantide, we needed some more knowledge base of peptides which have different molecular targets to promote Pantide applying to other orders of insects, and a different toxic mechanism to regularly alternate so as to avoid drug resistance.

To date, about 1500 toxin peptides from 97 spider species have been studied, though the number of spider toxin peptides is conservatively estimated up to 10 million. [1] So, our purpose is to establish a database collecting the information of those peptides, such as molecular target, taxon, toxicity, sequence. According to the database, if we first choose a target insect, then we can easily find out groups of suitable peptides used as Pantide. Therefore, we also need to create a method to select peptides from the database.

II. Method

The method of toxin selection can be separated into three part: crawler, filter, and selection.

  • Toxin Collection—we planned to collect information of toxin peptides to establish our own database for Pantide from protein databases and some research results like taxon and toxicity from published papers.
  • Toxin Filtering—based on background knowledge of toxin peptides, we set up some conditions to filter out those unsuitable to use as Pantide.
  • Toxin Processing—we used online protein analytic tools to classify the remained peptides into groups by their similarity. Finally, we select out three distinct peptides from different groups to proof concept of Pantide.

III. Step 1: Crawler

In the beginning, we searched on UniProtKB/Swiss-Prot. It is a freely accessible database of protein sequence and functional information that is the manually annotated and reviewed section. (http://www.uniprot.org/) By searching the keyword “insecticidal NOT crystal” we wanted to find all the proteins that have insecticidal activity excluding those crystal proteins of Bacillus thuringiensis, and we got 216 proteins as results.

Using the result, we established our Pantide database by crawling 11 entries of the protein information from UniProt. The entries are as follows.

  • The name of the protein
  • The description of protein function
  • The organisms/source of the protein sequence
  • The length of amino acids
  • The number of disulfides bonds
  • Propeptide & signal peptide—If the proteins have an N-terminal signal peptide and propeptide, a part of protein will be cleaved during maturation or activation.
  • Uniprot entry & Arachnoserver id—the accession number of protein in UniProtKB and ArachnoServer*.

*ArachnoServer is a manually curated database for protein toxins derived from spider venom.(http://www.arachnoserver.org/).

We also crawled other seven entries of protein toxicity recorded by Arachnoserver—molecular target, taxon, ED50, LD50, PD50, qualitative information, protein sequence from Arachnoserver. The term, Molecular target, is the effect site of toxin peptides, such as voltage-gated ion channels, GABA receptors and so on. Taxon, ED50, LD50, PD50, and the qualitative information are the toxicity against taxon that had been tested by experiments. The protein sequence from two databases is entirely the same.

We utilized BeautifulSoup 4.4.0, sqlite3 and gevent modules in Python 3.5 to develop our crawler. Moreover, we have submitted the code to GitHub.
(Link:https://github.com/chengchingwen/iGEM/blob/master/crawler.py)

IV. Step 2: Filter

After crawling the data, we used DB Browser for SQLite software to browse and used SQL to process our Pantide database. We tried to build a filter to find out peptides suitable to use as Pantide.

According to the previous articles, we knew that around 90% of spider venom toxin peptides contain ICK structure which is the most important domain that reacts with the voltage-gated ion channels of insects and some other receptors specifically. [2]

Therefore, to find these spider venom toxin peptides from Pantide database, we could start from searching for ICK structure, whose mass is among 1-10 kDa containing at least three disulfide bonds. [2] So we set a filter with three conditions.

  • The organism we choose must be spiders or tarantulas.
  • The length of the a.a. sequences are between 27 and 271 base pairs (1 kDa of protein has averagely nine amino acids, encoded by 27 base pairs)
  • The number of disulfide bonds is greater or equal to 3. After filtering with the three conditions, 113 peptides remained. Next, we set another filter to find out insecticidal peptides.
  • Molecular target contains “invertebrate,” but we also remain peptides without data.
          The reason why we keep the peptides without data was that they have the probability to be effective. In this stage, we got 63 candidates.
  • For efficacy experiment of Pantide, we choose our testee-Spodoptera litura as target insect. While there are 14 kinds of distinct Taxon in our database, including 4 Lepidoptera genus. Thus, we also set the other filter to find out peptides against Lepidoptera:

  • Taxon contains at least one of Spodoptera litura, Heliothis virescens, Manduca sexta and Spodoptera exigua, but we also remain peptides without data
          On the other hand, because we designed to produce Pantide by E.coli, that is difficult to express proteins containing disulfide bonds. We had chosen E.coli Rosetta-gami strain for enhanced disulfide bond formation, but to express a protein with more than four disulfide bonds is still a heavy load. So we finally filtered out those peptides containing too much disulfide bonds.
  • The number of disulfide bonds is less than or equal to four.
          The result was that we got 46 peptides which have the possibility to use as Pantide in proof concept experiment, and all of them is targeted to insects’ voltage-gated ion channels (excluding NULL).

V. Step 3: Selection

In this step, we tried to find three peptides that have different molecular target or mechanism from filtering result to do the test experiment. The method we used was to classify the remained peptides into groups by their structure similarity.

We used online analytic tools on NCBI to process those peptides.

We started with using Protein BLAST (Basic Local Alignment Search Tool) to search from the whole protein database for the similar query protein sequences related to all the 46 peptides and put those related peptides into groups.

The next was using COBALT (Constraint-based Multiple Alignment Tool) to align the sequence between groups to find out whether or not the two groups have the similar structure while they were not got together on the last step because of side chains and other factors. At last, we separated 46 peptides into four groups, containing 27, 12, 3, 2 peptides, and two alone.

Then we chose the three larger groups and used Conserved Domains Search, and found out that they belonged to the three conserved protein domain family. There are Omega-toxin Superfamily (cl05707), Toxin_28 Superfamily (cl06928) and Toxin_20 Superfamily (cl06915). The strings in brackets are unique ID of superfamilies in the conserved protein domain family database. Finally, we selected the representative peptides from each superfamily and got these three peptides, ω-hexatoxin-Hv1a, μ-segestritoxin-Sf1a and Orally active insecticidal peptide (OAIP).

VI. Future

To promote the applicability of Pantide, we still need to extend our database. The next step is to integrate with other toxin peptide databases, such as scorpions or cone snails, collect more peptides’ information from research results, and even combine with bioinformatics to build a new scoring system, and search for new potential peptides.

Reference

[1] King, G.F.; Gentz, M.C.; Escoubas, P.; Nicholson, G.M. A rational nomenclature for naming peptide toxins from spiders and other venomous animals. Toxicon 2008, 52, 264–276.

[2] Monique J. Windley, Volker Herzig, Sławomir A. Dziemborowicz, Margaret C. Hardy, Glenn F. King and Graham M. Nicholson (2012). Spider-Venom Peptides as Bioinsecticides. Toxins, 4, 191-227.

The Pest Prediction System

I. The Factors of Prediction

We built a prediction model for our device to support our spraying system. The model will predict the number of the pest, so we can know when our farm will be in the pest threat in the future. Thus we can open our spraying system spraying PANTIDE to protect our farmland. To know the number of pests in the future, we used seven weather data in the past 20 days including air pressure, the highest temperature, the lowest temperature, average temperature, humidity, precipitation, wind velocity, and also the accumulated number of bugs from the two periods including twenty to ten days ago and ten to one days ago. Because Pantide only influences on the larvae, we cannot easily predict the number of moths tomorrow or two or three days later. We needed to know the number of larvae in the future.

II. The Life History of Moth

After understanding the life history of the moths from Dr. Huang in Taiwan Agriculture Research Institute, we can know that the time of pupa becoming moth ranges from six days to 14 days. So the moths we caught in 20 days must be the larvae in our farm right now. Therefore, we utilized the data of the accumulated number of moth in the next 20 days as the target of our prediction system, and we can use the output to know the time that requires spraying Pantide for prevention.

III.The Software Design of Prediction Model

The method we used was called neural network, one of the popular machine learning solutions, or another well-known name called deep learning. The model we used was a combination of two kinds of neural network, Recurrent Neural Network (RNN) and Artificial Neural Network (ANN). Becuase we got seven features of 20 days, equals 140 features in total, we cannot simply put it into ANN to train the model. Therefore, we used RNN which was good for this kind of time series data to compress the 140 features into seven compressed data and got the seven feature and the other two feature which we have previously mentioned. Therefore, we only had nine features in total and then put them into the ANN as the input to get the final prediction. Then, it will compute the errors between the actual answer and the output and use gradient descent and backpropagation to modify the weight in each network each neuron. After a bunch of train steps, we got our model with about 80% of accuracy.

IV. The Quality of the Model

The prediction model is also a part of our device system. After using the device to collect data in a farm, we can not only predict the number of moth with the model we already trained but also retrain our model with the data we collect in that farm. So, now we can modify the model according to each farm to get a specific model for it.

The programming language we use is python3.5, and the deep learning framework is the one developed by Google called tendorflow.(See on GitHub)

Degradation Rate

I. Summary

The aim of this modeling was to predict and simulate the degradation rate of Pantide. Practically, the results would be integrated into our device to promote automatic control system. Once Pantide degraded below the effective level, it will spray the solution to replenish.

After discussion and experimental verification, we show that UV radiolytic oxidation contributes to the degradation of Pantide. We tried to build a model for simulating the relationship between applied UV light and degradation rate and then combined with UV intensity sensor to calculate the period of spraying Pantide.

II. Pantide degradation process

For the proteins used as Pantide, the inhibitor cystine knot (ICK) is significant to their function. Pantide with several disulfide bonds is often more stable in solution. If “native protein” is denatured, it loses disulfide bonds and becomes a less stable form, normally called “linear protein,” which is easily degradable.

Recent research shows that the spider toxin proteins containing ICK structure, for example, ω-hexatoxin-Hv1a (Hv1a), have high stability against temperature, pH, solvents and protease. In contrast, when Hv1a is denatured to linear form, it loses its stability and then degrades rapidly. [1]

There are many possible processes of Pantide degradation we discussed below. (Figure 1) Pantide may have a chance to be reduced to a linear form by reductants or reductases. For both native form and linear form proteins, it may suffer hydrolysis and proteolysis, resulting in denaturing or amino acid cleavage. Also, UV light of sun also leads to Pantide degradation. Though the energy of UV light may not be not enough to break the covalent bonds efficiently, proteins still could undergo radiolytic oxidation.

Figure 1. Pantide degradation process

According to this degradation process, we attempted to build up our model. The degradation rate of Pantide is contributed by hydrolysis, proteolysis, UV radiolytic oxidation and the reduction to linear form. We considered that the rate Pantide transformed to linear form is a part of degradation rate because linear form protein loses its function and it is also likely to be degraded.

However, since the whole process is too complex to verify, we divided the experiment into three parts, hydrolysis test, proteolysis test and UV radiolytic oxidation test, and further summarized the results to conclude. The purpose is to find out the degradation rate of Pantide and to verify the less stable linear form protein has.

On the other hand, because ICK structure domain mainly contributes the stability of these proteins; we could assume that the degradation processes of three target proteins are roughly the same, for the reason that, we chose Hv1a and Hv1a with lectin (Hv1a-lectin) to demonstrate.

III. Hydrolysis test

i. Theory

Like the reverse direction of polypeptide formation, proteins can be hydrolyzed into their constituent amino acids. The mechanism undergoes an E2 elimination since a nucleotide attach to sp2 hybridized acyl carbon (amide). [2] Therefore, the reaction rate depends on the concentration of both nucleotides and the protein:

Where, [P ] is the concentration of Pantide, Kh is the reaction constant of hydrolysis, and kA , kN , kB are the three contributing components of overall hydorlysis which are dependent to condition.

For Pantide, we would apply the protein solution in the farm at a constant pH. Thus, the rate of hydrolysis is proportional to the concentration of Pantide by a constant Kh , and the concentration of Pantide undergoes an exponential decay as time goes on.

ii. Experimental proof

We tested the chemical stability of both native and linear types of Hv1a and Hv1a-lectin in neutral PBS solvent (phosphate buffered saline, pH=7.4) in 4 ℃; for one day and seven days. SDS-PAGE showed the remained protein concentration and calculated by software ImageJ. (Figure 2, Figure 3)

Figure 2. SDS-PAGE gel and the concentrations of hydrolysis test to Hv1a (5.3 kDa). The samples were marked on the top of the gel.

Figure 3. SDS-PAGE gel and the concentrations of hydrolysis test to Hv1a-lectin (HL, 17.1 kDa). The samples were marked on the top of the gel.

The difference of the concentration of native Hv1a was not significant during seven days (1 day: 98%, seven days: 87%), while linear Hv1a only remained 9% after one day, and almost all degraded after seven days.

The test to Hv1a-lectin showed the similar result. Native Hv1a-lectin nearly did not degrade in 7 days (1 day: 110% , 7 days: 105%), but linear Hv1a-lectin remained 16% after 1 day, 3% after 7 days.

So, we could conclude that native proteins dissolved in neutral PBS solvent did not undergo hydrolysis (or at a very slow rate) in 4℃ for seven days, on the other side, linear form proteins degrade as time went on, and remained merely little after seven days.

IV. Proteolysis test

i. Theory

To simulate the degradation by protease, we used Michaelis-Menten kinetics model to express the rate equation. It is useful to describe enzymatic reactions, especially the one without ligand participation, such as proteolysis with proteases whose mechanism is first to find out the specific active site and then react to break it down. [3]

The proteolysis process is irreversible in vivo because the product, linear form protein, is degraded quickly. We used differential equation to describe the major proteolysis process:

Where [P ]: the concentation of Pantide, Vm,p : the maximum reaction rate of proteolysis, which is equal to the product of the concentration of total enzyme and turnover number kcat of specific protease and substrate, KM,p : Michaelis constant, which is the substrate concentration at which the reaction rate is half of Vm,p

We had assumed that the Vm,p is a constant because the proteases in vivo are in active and in equilibrium, and Vm,p represents the equivalent value for all kinds of protease in environment.

ii. Experimental proof

We designed two experiments to test the enzymatic stability towards protease of Hv1a and Hv1a-lectin. One was to observe the degrade level of both native and linear types of proteins applied by protease for one day, and the other one was to obtain the curve of the degradation rate of only linear form proteins in the period of four hours, because of the resistance against the protease.

For the first experiment, we dissolved the protein solutions in neutral PBS (pH=7.5) solvent, applied with 0.25% trypsin-EDTA(1:250), a serine protease, and then incubated the samples in work temperature 37℃ for one day. (Figure 4, Figure 5)

Figure 4. SDS-PAGE gel and the concentrations of trypsin resistance test to Hv1a (5.3 kDa). The samples were marked on the top of the gel.

Figure 5. SDS-PAGE gel and the concentrations of trypsin resistance test to Hv1a-lectin (HL, 17.1 kDa). The samples were marked on the top of the gel.

Compared with the sample incubated for one day without trypsin treatment, native Hv1a and Hv1a-lectin showed the resistance to trypsin protease (111% and 100%), but linear proteins were degraded by proteolysis (67% and 18%).

We next tested the proteolysis rate of only linear proteins in the period of four hours, and we drew the Lineweaver–Burk plot (also called “double reciprocal plot”) (Figure 6) to obtain Vm,p and KM,p for linear form Hv1a and Hv1a-lectin. (Table 1)

Figure 6. Lineweaver–Burk plot of proteolysis test of linear Hv1a (blue line) and Hv1a-lectin (orange line). The horizontal axis represents the reciprocal of substrate concentration, and the longitudinal axis represents the reciprocal of rate. The straight line’s x-intercept means the reciprocal of -KM,p , and y-intercept means the reciprocal of Vm,p .

Table 1. TheVm,p and KM,p of linear form Hv1a and Hv1a-lectin.

V. UV radiolytic oxidation test

i. Theory

In nature, proteins are also probable to degrade under the sunlight, and the reason is that the energy of light excites the solvent (almost water) and initially generate radicals, which afterward propagate and attack protein to break it down until termination.

Ionizing radiation causes the radiolysis of water is, the major process is shown below. (Figure 7)

Figure 7. The radiolysis process of water.

Though the real mechanism that radicals attack to protein is quite complex, we can simply indicate the rate of protein been attacked by an unknown power, n, of the total concentration of radicals. Then we derived the rate formula as differential equations.

Where [radical ]: the total concentation of radicals, [P ]: the concentation of Pantide, Gγ : the dose rate of absorbing γ-ray (radiation energy absorption rate per mass, for water, 1.42 Gy/s) [5], Aγ : the number of radicals created per energy (for water, 0.045 μmol/J) [5], I : the intensity of UVB from sunlight measured by UV Sensor (UVM30A), RT : the rate constant of radicals termination, which is equal to 2.365×10-7 mol-1 s-1 [6], KUV : the rate const ant of UV radiolytic oxidation to protiens, which is set to 44 mol-1 s-1 at the beginning [6]

We then used software MATLAB to simulate the degradation rate by UV radiolytic oxidation on the intensity of UVB from sunlight. The results showed that the degradation rate increases as rising intensity whatever n is, and eventually it tends to be fully degraded. (Figure 8)

Figure 8. The simulation of degradation rate by UV radiolytic oxidation on the intensity of UVB from sunlight.

ii. Experimental proof

We applied the four kinds of native protein solutions to the UVB light from UV transilluminator (302 nm, 50 mW/m2 ) in the period of 2 hours, where the environment temperature was 36.8℃ in average. The results showed the six proteins degraded under UVB light treatment and the tendency of the reduction of proteins corresponded to our model. (Figure 9)

Figure 9. The degradation rate of four proteins by UV radiolytic oxidation as time goes on.

The degradation rates are obviously different between the protein with and without fusing to lectin. The possible reason we concerned was the difference in amino length; the longer protein has a relatively high probability to be attacked.

To determine the relation between the UVB light intensity and the degradation, we fit the result from experiments to our model. We calculated the parameter n in formula (4) must be equal to 0.78. The rate constant KUV for Hv1a, Sf1a, OAIP 7.8 15.7and Hv1a-lectin are 2.4, 7.8, 15.7 and 90.3.

We also applied native Hv1a-lectin with another UV transilluminator (286 nm, 36.4 mW/m2 ), and compared with the prediction from our model. (Figure 10)

Figure 10. The model prediction compared with the experiment data.

The figure showed our model is on the right way, but to accomplish our purpose; we still need more experiment data to correct the parameters.

VI. Summarization

The whole equation of degradation rate could be expressed by the summation of the rates of three possible degrade processes, that is hydrolysis, proteolysis, and UV radiolytic oxidation, and the rate that proteins transfer from native form to linear form indicated as RSS

However, according to the previous experiments, if we only considered about the protein in native form, because of the high chemical stability and protease resistance, Kh and Vm,p is much smaller than KUV , and the first two terms on the right side of the equal sign is relative insignificant. As for the reduction of disulfide bonds, since proteins are most stable in their favorable dimensional structure, it does not tend to break this strong bond down, so we assumed that RSS is not contributed a lot for degradation in the nature. Then the equation was simplified to only one term.

and

As results, the degradation rate of Pantide mainly related to UV intensity expressed by (6) and (7). To verify our degradation rate model, and the practical use of our device, we had done the UV radiolytic oxidation test outdoor.

We put the samples on a wide square in a transparent and closed acrylic box outdoor for 4 hours at a different time in a day. Combined with UV intensity sensor, we got the remained protein concentration with the average UVB light intensity in each period. (Figure 11)

Figure 11. The remained protein concentration and the average UVB light intensity in each 4 hours at a different time in a day.

Reference

[1] Volker Herzig and Glenn F. King (2015). The Cystine Knot Is Responsible for the Exceptional Stability of the Insecticidal Spider Toxin ω-Hexatoxin-Hv1a. Toxins, 7, 4366-4380.

[2] Anonymous. (n.d.). HYDROLYSIS. Retrieved October 16, 2016 , from https://zh.scribd.com/document/79207692/Hydrolysis-2006

[3] Hedstrom, L. (2002, May 14). Serine Protease Mechanism and Specificity. Chem. Rev. 2002, 102, 4501-4523.

[4] Guozhong Xu & Mark R. Chance (2005). Radiolytic Modification of Sulfur-Containing Amino Acid Residues in Model Peptides: Fundamental Studies for Protein Footprinting. Anal. Chem, 77, 2437-2449.

[5] Mary E. Dzaugis, Arthur J. Spivack, Steven D'Hondt (2015, April 10). A quantitative model of water radiolysis and chemical production rates near radionuclide-containing solids. Radiation Physics and Chemistry, 115, 127-134.

[6] Bachari, T. S. (n.d.). Theoretical Investigation on The Kinetics of Free Radical Reactions of Styrene Emulsion Polymerization . Retrieved from http://www.iasj.net/iasj?func=fulltext&aId=13996