Ensemble modelling relies on priors (in the Bayesian sense) that express our belief about the plausible parameter values. Therefore, care has to be taken to find all the relevant information that could influence our beliefs.
Collecting all relevant information on parameter values
The first step in ensemble modelling is actually collecting all experimental information about the parameter values. In an ideal world these would all be in the exact same conditions as our system, however this is highly unlikely. This is accounted for with the second step.
Weighting allows us to put more reliance on those parameter values that are of the highest quality and relevance. Values observed from a system close to ours were given a higher weighting score, while data from very different systems (different species, different experimental conditions, different enzymes) received lower weights. This allows for a systematic approach in including parameter values with very different levels of reliability into the data set as they will be given appropriate weighting scores, rather than being arbitrarily discarded (which, when applied strictly, would often result in having no information available at all). Our final weighting scheme is illustrated in this diagram:
The MATLAB script for calculating each parameter’s probability density function is available from our Github page.The above flowchart shows the questions we asked ourselves about each parameter source.
This was implemented in practice by having a spreadsheet to fill out the responses (the user simply has to select the options from a drop-down menu). The weights are calculated in the spreadsheet. MATLAB then reads in the weights and associated parameter values. A dataset is then made in which there are n copies of each data point (where n = weight). From this a probability density function can be created for later sampling.
- Dealing with parameter values with quoted errors
If we know the experimental uncertainty (standard deviation) associated with a parameter value reported in the literature, this data point is split into 5 data points, from -2 standard deviations through to +2 standard deviations; the weighting of the original data point is then redistributed to each new data point relative to the probability density of a normal distribution.