Difference between revisions of "Team:Manchester/Model/ParameterSelection"

Line 23: Line 23:
 
}
 
}
  
ul li{
+
.list1 li{
 
     font-size: 17px;
 
     font-size: 17px;
 +
    line-height: 120%;
 
}
 
}
  
Line 96: Line 97:
 
The extra complications are done as simply as possible.</p>
 
The extra complications are done as simply as possible.</p>
 
  <br /><br />
 
  <br /><br />
<ul>
+
 
 +
<ul class="list1">
 
<li>Data point error
 
<li>Data point error
 
  <br /><br />Any documented error associated with a point is taken as 5 data points with one at -2 standard deviations through to one at 2 standard deviation, the weighting of the original data point is then redistributed to each new data point relative to their probability from the normal distribution they are described by. </li>
 
  <br /><br />Any documented error associated with a point is taken as 5 data points with one at -2 standard deviations through to one at 2 standard deviation, the weighting of the original data point is then redistributed to each new data point relative to their probability from the normal distribution they are described by. </li>

Revision as of 23:10, 16 October 2016

Manchester iGEM 2016

Parameter selection

THEORY

ensemble benefits

Modelling is only as good as the data provided to it, often called GIGO (Garbage In, Garbage Out). This needs to be taken into account when selecting parameters to use as your starting point for ensemble modelling. Ensemble modelling is better able to handle this than most as it can account for the uncertainty in these parameters - still care has to be taken to improve the quality of the initial dataset.

There are 3 main steps to bear in mind when selecting parameters each with its on key consideration. These steps help deal with the garbage which could ruin the model.

Collecting all relevant parameters

The first is actually collecting the parameters - the more the better. 100 experimentally sourced parameters from 100 different papers will be much more representative than just a handful. In an ideal world these would all be in the exact same conditions that your system will be however this is highly unlikely. This is accounted for with the second step.

weighting

Weighting allows you to give more significance to parameters which more closely fit your system. This allows the inclusion parameters with different conditions into the initial dataset as they will simply be given a lower weighting score. The lower the weighting score the less confident in that value you are.

truncating Tails

Finally the dataset should be filtered, this is done by removing points from the top and bottom of the dataset. The amount removed depends on the confidence interval set. Typically removing the top and bottom 2.5% of points is sufficient as this provides a 95% confidence interval.

After following these steps the probability density (pdf’s) can be calculated. See the pdf section.

FIGURE GOES HERE

code from flow diagram

Relevant github link. All files discussed here are available for reference



This flowchart shows the questions we asked ourselves about each parameter source. This was implemented in practice by having a spread sheet to fill out the responses (the user simply has to select the options from a drop down.) The weights are calculated in the spreadsheet. Matlab the reads in the weights and associated values. A data set is then made in which there are n copies of each data point (where n = weighting). From this a pdf can be created for later sampling.

The extra complications are done as simply as possible.



  • Data point error

    Any documented error associated with a point is taken as 5 data points with one at -2 standard deviations through to one at 2 standard deviation, the weighting of the original data point is then redistributed to each new data point relative to their probability from the normal distribution they are described by.


  • Any unphysical zeros resulting from this are removed from the data point. To extend this work, any 5 data point set which gets a zero should have its weighting renormalized, also extending to 2n + 1 data points where n is large would be more representative. The first was not done because if you are getting zeros the quoted errors clearly don’t follow a normal distribution and shouldn’t be quoted as such. The second wasn’t done because you run into rounding issues which can only be removed by increasing the sample size for pdf creation. It was decided that this wasn’t worth computer time for such a small increase in accuracy.


  • Removing tails.

    The array of parameter values for making pdf is ordered and then the top and bottom 2.5% of data points floored is removed. This is done to remove extreme outliers and as such help the pdf creators.
.