Difference between revisions of "Team:Manchester/Model/ParameterSelection"

 
(35 intermediate revisions by 3 users not shown)
Line 4: Line 4:
  
 
<style>
 
<style>
 +
.title11{
 +
    color:black;
 +
}
 +
 
.width100{
 
.width100{
 
     width: 80%;
 
     width: 80%;
Line 17: Line 21:
 
     margin:auto;
 
     margin:auto;
 
}
 
}
 +
 +
.width80{
 +
    width:80%;
 +
    margin:auto;
 +
}
 +
 +
.width70{
 +
    width:70%;
 +
    margin:auto;
 +
}
 +
  
 
.width50{
 
.width50{
 
     width:50%;
 
     width:50%;
 
     margin:auto;
 
     margin:auto;
 +
}
 +
 +
.list1 li{
 +
    font-size: 17px;
 +
    line-height: 150%;
 
}
 
}
  
Line 26: Line 46:
 
     width: 80%;
 
     width: 80%;
 
}
 
}
 +
 +
p.title2{
 +
    display:inline-block;
 +
    border-bottom: 1px black solid ;
 +
}
 +
  
 
.blue_title{
 
.blue_title{
Line 34: Line 60:
 
.smalltitle{
 
.smalltitle{
 
     font-size: 20px;
 
     font-size: 20px;
     color: blue;
+
     color: #4c4cff ;
 
     text-decoration: underline;
 
     text-decoration: underline;
 
     margin-top: 50px;
 
     margin-top: 50px;
     margin-bottom: 30px;
+
     padding-bottom: 20px;
 
}
 
}
  
 
.smalltitle1{
 
.smalltitle1{
 
     font-size: 20px;
 
     font-size: 20px;
     color: pink;
+
     color: #ff1493;
 
     text-decoration: underline;
 
     text-decoration: underline;
 
     margin-top: 50px;
 
     margin-top: 50px;
     margin-bottom: 30px;
+
     padding-bottom: 20px;
 
}
 
}
  
Line 55: Line 81:
 
</style>
 
</style>
  
<h1 class="title11">Parameter selection</h1>
+
<h1 id="TopTitle"class="title11">Parameter selection</h1>
  
 
<div class="team">
 
<div class="team">
<h1 class="blue_title">THEORY </h1>
 
  
<h2 class="smalltitle">ensemble benefits</h2>
+
<p id="heading1" class="title2" style="font-size:25px;"></p>
  
<p>
+
<p style="font-size: 17px;">
Modelling is only as good as the data provided to it, often called GIGO (Garbage In, Garbage Out). This needs to be taken into account when selecting parameters to use as your starting point for ensemble modelling. Ensemble modelling is better able to handle this than most as it can account for the uncertainty in these parameters - still care has to be taken to improve the quality of the initial dataset.
+
Ensemble modelling relies on priors (in the Bayesian sense) that express our belief about the plausible parameter values. Therefore, care has to be taken to find all the relevant information that could influence our beliefs.  
 
  <br /><br />
 
  <br /><br />
There are 3 main steps to bear in mind when selecting parameters each with its on key consideration. These steps help deal with the garbage which could ruin the model.
+
  
<h2 class="smalltitle">Collecting all relevant parameters</h2>
+
<p class="title2" style="font-size:25px;">Collecting all relevant information on parameter values</p>
  
The first is actually collecting the parameters - the more the better. 100 experimentally sourced parameters from 100 different papers will be much more representative than just a handful. In an ideal world these would all be in the exact same conditions that your system will be however this is highly unlikely. This is accounted for with the second step.
+
<p style="font-size: 17px;">The first step in ensemble modelling is actually collecting all experimental information about the parameter values. In an ideal world these would all be in the exact same conditions as our system, however this is highly unlikely. This is accounted for with the second step.</p>
  
<h2 class="smalltitle">weighting</h2>
+
</br>
  
Weighting allows you to give more significance to parameters which more closely fit your system. This allows the inclusion parameters with different conditions into the initial dataset as they will simply be given a lower weighting score. The lower the weighting score the less confident in that value you are.
+
</br>
  
<h2 class="smalltitle">truncating Tails</h2>
+
<p id ="heading2"class="title2" style="font-size:25px;">Weighting</p>
  
Finally the dataset should be filtered, this is done by removing points from the top and bottom of the dataset. The amount removed depends on the confidence interval set. Typically removing the top and bottom 2.5% of points is sufficient as this provides a 95% confidence interval.
+
<p style="font-size: 17px;">Weighting allows us to put more reliance on those parameter values that are of the highest quality and relevance. Values observed from a system close to ours were given a higher weighting score, while data from very different systems (different species, different experimental conditions, different enzymes) received lower weights. This allows for a systematic approach in including parameter values with very different levels of reliability into the data set as they will be given appropriate weighting scores, rather than being arbitrarily discarded (which, when applied strictly, would often result in having no information available at all). Our final weighting scheme is illustrated in this diagram: </br> </p>
  <br /><br />
+
After following these steps the probability density (pdf’s) can be calculated. See the pdf section.
+
  
FIGURE GOES HERE
 
  
<h2 class="smalltitle1">code from flow diagram </h2>
+
<center>
 +
<img class="width70" src="https://static.igem.org/mediawiki/2016/9/9e/T--Manchester--model_flow_charts.png" alt="flow charts" />
 +
</center>
  
Relevant github link. All files discussed here are available for reference
+
</br>
 +
</br>
 +
<h1 id="TopTitle"class="title11">Practical considerations</h1>
 +
</br>
 +
<p style="font-size: 17px;">The MATLAB script for calculating each parameter’s probability density function is available from <a href="https://github.com/Manchester-iGem-2016/UoMiGem2016">our Github page.</a>The above flowchart shows the questions we asked ourselves about each parameter source.
 +
This was implemented in practice by having a spreadsheet to fill out the responses (the user simply has to select the options from a drop-down menu). The weights are calculated in the spreadsheet. MATLAB then reads in the weights and associated parameter values. A dataset is then made in which there are n copies of each data point (where n = weight). From this a probability density function can be created for later sampling.
 +
<br /><br />
  
This flowchart shows the questions we asked ourselves about each parameter source.
+
 
This was implemented in practice by having a spread sheet to fill out the responses (the user simply has to select the options from a drop down.)  The weights are calculated in the spreadsheet. Matlab the reads in the weights and associated values. A data set is then made in which there are n copies of each data point (where n = weighting). From this a pdf can be created for later sampling.  
+
<ul class="list1">
<br /><br />
+
<li>Dealing with parameter values with quoted errors
The extra complications are done as simply as possible. FOLLOWING are BULLET POINTS IF POSSIBLE
+
<br /><br />If we know the experimental uncertainty (standard deviation) associated with a parameter value reported in the literature, this data point is split into 5 data points, from -2 standard deviations through to +2 standard deviations; the weighting of the original data point is then redistributed to each new data point relative to the probability density of a normal distribution. </li>
 
  <br /><br />
 
  <br /><br />
Data point error
 
Any documented error associated with a point is taken as 5 data points with one at -2 standard deviations through to one at 2 standard deviation, the weighting of the original data point is then redistributed to each new data point relative to their probability from the normal distribution they are described by.
 
<br /><br />
 
Any unphysical zeros resulting from this are removed from  the data point. To extend this work, any 5 data point set which gets a zero should have its weighting renormalized, also extending to 2n + 1 data points where n is large would be more representative. The first was not done because if you are getting zeros the quoted errors clearly don’t follow a normal distribution and shouldn’t be quoted as such. The second wasn’t done because you run into rounding issues which can only be removed by increasing the sample size for pdf creation. It was decided that this wasn’t worth computer time for such a small increase in accuracy.
 
<br /><br />
 
Removing tails.
 
The array of parameter values for making pdf is ordered and then the top and bottom 2.5% of data points floored is removed. This is done to remove extreme outliers and as such help the pdf creators.
 
  
.
+
</ul>
 +
 
  
  
  
 
</p>  
 
</p>  
 +
<a href="https://2016.igem.org/Team:Manchester/Model/PDF">Continue to Generating Probability Density Functions</a>
 +
</br>
 +
<a href="https://2016.igem.org/Team:Manchester/Model">Return to overview</a>
 +
<span class="box"></span>
 +
 
</div>
 
</div>
 
</html>
 
</html>
 
{{Manchester/CSS/footer}}
 
{{Manchester/CSS/footer}}

Latest revision as of 02:24, 20 October 2016

Manchester iGEM 2016

Parameter selection

Ensemble modelling relies on priors (in the Bayesian sense) that express our belief about the plausible parameter values. Therefore, care has to be taken to find all the relevant information that could influence our beliefs.

Collecting all relevant information on parameter values

The first step in ensemble modelling is actually collecting all experimental information about the parameter values. In an ideal world these would all be in the exact same conditions as our system, however this is highly unlikely. This is accounted for with the second step.



Weighting

Weighting allows us to put more reliance on those parameter values that are of the highest quality and relevance. Values observed from a system close to ours were given a higher weighting score, while data from very different systems (different species, different experimental conditions, different enzymes) received lower weights. This allows for a systematic approach in including parameter values with very different levels of reliability into the data set as they will be given appropriate weighting scores, rather than being arbitrarily discarded (which, when applied strictly, would often result in having no information available at all). Our final weighting scheme is illustrated in this diagram:

flow charts


Practical considerations


The MATLAB script for calculating each parameter’s probability density function is available from our Github page.The above flowchart shows the questions we asked ourselves about each parameter source. This was implemented in practice by having a spreadsheet to fill out the responses (the user simply has to select the options from a drop-down menu). The weights are calculated in the spreadsheet. MATLAB then reads in the weights and associated parameter values. A dataset is then made in which there are n copies of each data point (where n = weight). From this a probability density function can be created for later sampling.

  • Dealing with parameter values with quoted errors

    If we know the experimental uncertainty (standard deviation) associated with a parameter value reported in the literature, this data point is split into 5 data points, from -2 standard deviations through to +2 standard deviations; the weighting of the original data point is then redistributed to each new data point relative to the probability density of a normal distribution.


Continue to Generating Probability Density Functions
Return to overview