Parameter selection
Ensemble modelling relies on priors (in the bayesian sense) that express our belief about the plausible parameter values. Therefore, care has to be taken to find all the relevant information that could influence our beliefs.
Collecting all relevant information on parameter values
The first step in ensemble modelling is actually collecting all experimental information about the parameter values. In an ideal world these would all be in the exact same conditions as our system, however this is highly unlikely. This is accounted for with the second step.
Weighting
Weighting allows us to put more reliance on those parameter values that are of the highest quality and relevance. Values observed from a system close to ours were given a higher weighting score, while data from very different systems (different species, different experimental conditions, different enzymes) received lower weights. This allows for a systematic approach in including parameter values with very different levels of reliability into the data set as they will be given appropriate weighting scores, rather than being arbitrarily discarded (which, when applied strictly, would often result in having no information available at all). Our final weighting scheme is illustrated in this diagram:
Practical considerations
The MATLAB script for calculating each parameter’s probability density function is available from our Github page.The above flowchart shows the questions we asked ourselves about each parameter source.
This was implemented in practice by having a spread sheet to fill out the responses (the user simply has to select the options from a drop-down menu). The weights are calculated in the spreadsheet. Matlab then reads in the weights and associated values. A data set is then made in which there are n copies of each data point (where n = weight). From this a probability density function can be created for later sampling.
- Dealing with parameter values with quoted errors
If we know the experimental uncertainty (standard deviation) associated with a parameter value reported in the literature, this data point is split into 5 data points, from -2 standard deviations through to +2 standard deviations; the weighting of the original data point is then redistributed to each new data point relative to the probability density of a normal distribution. - Any unphysical zeros resulting from this are removed from the data point. To extend this work, any 5 data point set which gets a zero should have its weighting renormalized, also extending to 2n + 1 data points where n is large would be more representative. The first was not done because if you are getting zeros the quoted errors clearly don’t follow a normal distribution and shouldn’t be quoted as such. The second wasn’t done because you run into rounding issues which can only be removed by increasing the sample size for pdf creation. It was decided that this wasn’t worth computer time for such a small increase in accuracy.