Difference between revisions of "Team:Manchester/Model/ParameterSelection"

 
(16 intermediate revisions by 2 users not shown)
Line 85: Line 85:
 
<div class="team">
 
<div class="team">
  
<p class="title2" style="font-size:25px;">Ensemble benefits</p>
+
<p id="heading1" class="title2" style="font-size:25px;"></p>
  
 
<p style="font-size: 17px;">
 
<p style="font-size: 17px;">
Ensemble modelling relies on priors that express our belief of the plausible parameter values. Therefore, care has to be taken to be taken to improve the quality of the initial data set.  
+
Ensemble modelling relies on priors (in the Bayesian sense) that express our belief about the plausible parameter values. Therefore, care has to be taken to find all the relevant information that could influence our beliefs.  
 
  <br /><br />
 
  <br /><br />
There are 3 main steps to bear in mind when selecting parameters each with its on key consideration. These steps helps to reduce the uncertainties and improve model prediction accuracy.</p>
+
   
  </br>
+
<a href="#TopTitle">Return to top of page</a>
+
</br>
+
  
<p class="title2" style="font-size:25px;">Collecting all relevant parameters</p>
+
<p class="title2" style="font-size:25px;">Collecting all relevant information on parameter values</p>
  
<p style="font-size: 17px;">The first is actually collecting the parameter values - the more the better. 100 experimentally sourced parameters from 100 different papers will be much more representative than just a handful. In an ideal world these would all be in the exact same conditions as our system, however this is highly unlikely. This is accounted for with the second step.</p>
+
<p style="font-size: 17px;">The first step in ensemble modelling is actually collecting all experimental information about the parameter values. In an ideal world these would all be in the exact same conditions as our system, however this is highly unlikely. This is accounted for with the second step.</p>
  
 
</br>
 
</br>
<a href="#TopTitle">Return to top of page</a>
+
 
 
</br>
 
</br>
  
<p class="title2" style="font-size:25px;">Weighting</p>
+
<p id ="heading2"class="title2" style="font-size:25px;">Weighting</p>
  
<p style="font-size: 17px;">Weighting allows you to give more confidence to the values found from the previous step. Values observed from a system close to ours were given a higher weighting score. This allows for a systematic approach in including parameter values with different measurement conditions into the data set as they will be given appropriate weighting scores. The lower weighting score would reflect that we are less confident in that value.  </br> </p>
+
<p style="font-size: 17px;">Weighting allows us to put more reliance on those parameter values that are of the highest quality and relevance. Values observed from a system close to ours were given a higher weighting score, while data from very different systems (different species, different experimental conditions, different enzymes) received lower weights. This allows for a systematic approach in including parameter values with very different levels of reliability into the data set as they will be given appropriate weighting scores, rather than being arbitrarily discarded (which, when applied strictly, would often result in having no information available at all). Our final weighting scheme is illustrated in this diagram: </br> </p>
  
  
Line 113: Line 110:
  
 
</br>
 
</br>
<a href="#TopTitle">Return to top of page</a>
 
 
</br>
 
</br>
<p class="title2" style="font-size:25px;">Truncating Tails</p>
+
<h1 id="TopTitle"class="title11">Practical considerations</h1>
 
+
</br>
<p style="font-size: 17px;">Finally the dataset should be filtered, this is done by removing points from the top and bottom of the dataset. The amount removed depends on the confidence interval set. Typically removing the top and bottom 2.5% of points is sufficient as this provides a 95% confidence interval. This helps greatly with getting an accurate pdf from parametric methods. See PDF section.
+
<p style="font-size: 17px;">The MATLAB script for calculating each parameter’s probability density function is available from <a href="https://github.com/Manchester-iGem-2016/UoMiGem2016">our Github page.</a>The above flowchart shows the questions we asked ourselves about each parameter source.
 +
This was implemented in practice by having a spreadsheet to fill out the responses (the user simply has to select the options from a drop-down menu). The weights are calculated in the spreadsheet. MATLAB then reads in the weights and associated parameter values. A dataset is then made in which there are n copies of each data point (where n = weight). From this a probability density function can be created for later sampling.  
 
  <br /><br />
 
  <br /><br />
After following these steps the probability density function (PDF) can be calculated.</p>
 
  
 
</br>
 
<a href="#TopTitle">Return to top of page</a>
 
</br>
 
 
<p class="title2" style="font-size:25px;">Code from flow diagram </p>
 
 
<p style="font-size: 17px;">Relevant github link. All files discussed here are available for reference</p>
 
<br /><br />
 
<p style="font-size: 17px;">The above flowchart shows the questions we asked ourselves about each parameter source.
 
This was implemented in practice by having a spread sheet to fill out the responses (the user simply has to select the options from a drop down.)  The weights are calculated in the spreadsheet. Matlab then reads in the weights and associated values. A data set is then made in which there are n copies of each data point (where n = weighting). From this a pdf can be created for later sampling.
 
<br /><br />
 
The extra complications are done as simply as possible.</p>
 
<br /><br />
 
  
 
<ul class="list1">
 
<ul class="list1">
<li>Data point error
+
<li>Dealing with parameter values with quoted errors
  <br /><br />Any documented error associated with a point is taken as 5 data points with one at -2 standard deviations through to one at 2 standard deviation, the weighting of the original data point is then redistributed to each new data point relative to their probability from the normal distribution they are described by. </li>
+
  <br /><br />If we know the experimental uncertainty (standard deviation) associated with a parameter value reported in the literature, this data point is split into 5 data points, from -2 standard deviations through to +2 standard deviations; the weighting of the original data point is then redistributed to each new data point relative to the probability density of a normal distribution. </li>
 
  <br /><br />
 
  <br /><br />
<li>Any unphysical zeros resulting from this are removed from  the data point. To extend this work, any 5 data point set which gets a zero should have its weighting renormalized, also extending to 2n + 1 data points where n is large would be more representative. The first was not done because if you are getting zeros the quoted errors clearly don’t follow a normal distribution and shouldn’t be quoted as such. The second wasn’t done because you run into rounding issues which can only be removed by increasing the sample size for pdf creation. It was decided that this wasn’t worth computer time for such a small increase in accuracy.</li>
 
<br /><br />
 
<li>Removing tails.
 
<br /><br />The array of parameter values for making pdf is ordered and then the top and bottom 2.5% of data points floored is removed. This is done to remove extreme outliers and as such help the pdf creators. </li>
 
  
 
</ul>
 
</ul>
Line 151: Line 129:
  
 
</p>  
 
</p>  
</br>
+
<a href="https://2016.igem.org/Team:Manchester/Model/PDF">Continue to Generating Probability Density Functions</a>
<a href="#TopTitle">Return to top of page</a>
+
 
</br>
 
</br>
 
<a href="https://2016.igem.org/Team:Manchester/Model">Return to overview</a>
 
<a href="https://2016.igem.org/Team:Manchester/Model">Return to overview</a>
<span class="box1"></span>
+
<span class="box"></span>
  
 
</div>
 
</div>
 
</html>
 
</html>
 
{{Manchester/CSS/footer}}
 
{{Manchester/CSS/footer}}

Latest revision as of 02:24, 20 October 2016

Manchester iGEM 2016

Parameter selection

Ensemble modelling relies on priors (in the Bayesian sense) that express our belief about the plausible parameter values. Therefore, care has to be taken to find all the relevant information that could influence our beliefs.

Collecting all relevant information on parameter values

The first step in ensemble modelling is actually collecting all experimental information about the parameter values. In an ideal world these would all be in the exact same conditions as our system, however this is highly unlikely. This is accounted for with the second step.



Weighting

Weighting allows us to put more reliance on those parameter values that are of the highest quality and relevance. Values observed from a system close to ours were given a higher weighting score, while data from very different systems (different species, different experimental conditions, different enzymes) received lower weights. This allows for a systematic approach in including parameter values with very different levels of reliability into the data set as they will be given appropriate weighting scores, rather than being arbitrarily discarded (which, when applied strictly, would often result in having no information available at all). Our final weighting scheme is illustrated in this diagram:

flow charts


Practical considerations


The MATLAB script for calculating each parameter’s probability density function is available from our Github page.The above flowchart shows the questions we asked ourselves about each parameter source. This was implemented in practice by having a spreadsheet to fill out the responses (the user simply has to select the options from a drop-down menu). The weights are calculated in the spreadsheet. MATLAB then reads in the weights and associated parameter values. A dataset is then made in which there are n copies of each data point (where n = weight). From this a probability density function can be created for later sampling.

  • Dealing with parameter values with quoted errors

    If we know the experimental uncertainty (standard deviation) associated with a parameter value reported in the literature, this data point is split into 5 data points, from -2 standard deviations through to +2 standard deviations; the weighting of the original data point is then redistributed to each new data point relative to the probability density of a normal distribution.


Continue to Generating Probability Density Functions
Return to overview