Team:Manchester/Model/ParameterSelection

Manchester iGEM 2016

Parameter selection

Contents

Ensemble Benefits
Weighting
Truncating Tails
code from flow diagram

Ensemble benefits

Ensemble modelling relies on priors that express our belief of the plausible parameter values. Therefore, care has to be taken to be taken to improve the quality of the initial data set.

There are 3 main steps to bear in mind when selecting parameters each with its on key consideration. These steps help to reduce the uncertainties and improve model prediction accuracy.


Return to top of page

Collecting all relevant parameters

The first is actually collecting the parameter values - the more the better. 100 experimentally sourced parameters from 100 different papers will be much more representative than just a handful. In an ideal world these would all be in the exact same conditions as our system, however this is highly unlikely. This is accounted for with the second step.


Return to top of page

Weighting

Weighting allows you to give more confidence to the values found from the previous step. Values observed from a system close to ours were given a higher weighting score. This allows for a systematic approach in including parameter values with different measurement conditions into the data set as they will be given appropriate weighting scores. The lower weighting score would reflect that we are less confident in that value.

flow charts

Return to top of page

Truncating Tails

Finally the dataset should be filtered, this is done by removing points from the top and bottom of the dataset. The amount removed depends on the confidence interval set. Typically removing the top and bottom 2.5% of points is sufficient as this provides a 95% confidence interval. This helps greatly with getting an accurate pdf from parametric methods. See PDF section.

After following these steps the probability density function (PDF) can be calculated.


Return to top of page

Practical considerations

code from flow diagram

Relevant github link. All files discussed here are available for reference



The above flowchart shows the questions we asked ourselves about each parameter source. This was implemented in practice by having a spread sheet to fill out the responses (the user simply has to select the options from a drop down.) The weights are calculated in the spreadsheet. Matlab then reads in the weights and associated values. A data set is then made in which there are n copies of each data point (where n = weighting). From this a pdf can be created for later sampling.

The extra complications are done as simply as possible.



  • Data point error

    Any documented error associated with a point is taken as 5 data points with one at -2 standard deviations through to one at 2 standard deviation, the weighting of the original data point is then redistributed to each new data point relative to their probability from the normal distribution they are described by.


  • Any unphysical zeros resulting from this are removed from the data point. To extend this work, any 5 data point set which gets a zero should have its weighting renormalized, also extending to 2n + 1 data points where n is large would be more representative. The first was not done because if you are getting zeros the quoted errors clearly don’t follow a normal distribution and shouldn’t be quoted as such. The second wasn’t done because you run into rounding issues which can only be removed by increasing the sample size for pdf creation. It was decided that this wasn’t worth computer time for such a small increase in accuracy.


  • Removing tails.

    The array of parameter values for making pdf is ordered and then the top and bottom 2.5% of data points floored is removed. This is done to remove extreme outliers and as such help the pdf creators.


Return to top of page
Return to overview