This reading list provides an initial knoweldge base for anyone wishing to work in the MUCM area. It is divided into six areas:

Fully Bayesian Methods
Bayes Linear Methods
Techniques from Other Fields
Experimental Design


The following references present some of the general background to the development of Bayesian methods that underlie the MUCM project.

Sacks, J., Welch, W.J., Mitchell, T.J. and Wynn, H.P. (1989) Design and analysis of computer experiments. Statistical Science, 4, 409–435. This is useful for historical understanding.  Research in DACE (design and analysis of computer experiments) was the forerunner of BACCO (Bayesian analysis of computer code outputs), and hence of MUCM.  The emphasis was on emulating the model (although the term ‘emulator’ was not then used) in order to predict and optimise the output, and there was much research into constructing suitable designs.  Although the ideas of uncertainty analysis and sensitivity analysis did attract some attention, they (together with calibration) are more associated with the later systematic Bayesian developments of BACCO.

Santner, T., Williams, B. and Notz, W. (2003). The Design and Analysis of Computer Experiments. Springer Verlag: New York. This book provides an up to date coverage of DACE and related work on Gaussian process modelling of computer codes primarily from a frequentist viewpoint, and with an emphasis on designing the set of training runs in order to predict and optimise code output. Saltelli, A., Chan, K. and Scott, E.M. (eds.) (2000). Sensitivity Analysis. Wiley: New York. The traditional methods for doing uncertainty and sensitivity analyses of model outputs are based on Monte Carlo sampling, and this book gives a thorough coverage of the various approaches. The greater efficiency of techniques based on Gaussian process modelling allows such analyses to be done with far fewer model runs and hence on much more complex models, but it is useful to know about the methods that MUCM seeks to improve upon.

Koehler, J.R., and Owen, A.B. (1996) Computer Experiments in Handbook of Statistics, 13: Design and Analysis of Experiments, 261-308, eds Ghosh, S. and Rao, C.R. North-Holland : Amsterdam. A thoughtful review of design and analysis for computer experiments, including some technical material, and some of Art Owen's design ideas.


The following references are basic reading for the fully Bayesian methodology for model uncertainty, sensitivity analysis and calibration.

O'Hagan, A. (2006). Bayesian analysis of computer code outputs: a tutorial. Reliability Engineering and System Safety 91, 1290–1300. This is a very simplified presentation of the basic ideas of MUCM from the full Bayesian perspective.  It is written for modellers and model users rather than statisticians, so aims for intuitive understanding of how Gaussian process emulators work, rather than technical depth.  It does give references to the more technical literature.  It assumes some understanding of uncertainty analysis and sensitivity analysis.

Kennedy, M.C. and O'Hagan, A. (2001). Bayesian calibration of computer models (with discussion). Journal of the Royal Statistical Society, B63, 425–464. This is the classic paper which explicitly introduced the idea of a model inadequacy term to represent the difference between a computer code’s output (using the ‘best’ input settings) and reality.  Although the technical details of Bayesian calibration are quite dense, it begins with a good presentation of the background to MUCM – what the problems are that concern modellers and model users, the sources of uncertainty that arise when tackling such problems, and many aspects of modelling with Gaussian processes.

Oakley, J.E. and O'Hagan, A. (2004). Probabilistic sensitivity analysis of complex models: a Bayesian approach. Journal of the Royal Statistical Society B 66, 751-769. This paper deals with variance-based sensitivity analysis, which is an important tool in understanding the sensitivity of model outputs to individual uncertain inputs.


Kennedy, M.C., Anderson, C.W., Conti, S. and O'Hagan, A. (2006). Case studies in Gaussian process modelling of computer codes. Reliability Engineering and System Safety 91, 1301–1309. Another paper aimed at modellers and model users.  It is useful for people interested in MUCM because it shows how the Gaussian process emulators are used in practice and how they can actually give insight into the way the model works.

Challenor, P.G., Hankin, R.K.S. and Marsh, R. (2006) Towards the probability of rapid climate change. In Avoiding Dangerous Climate Change 55-63. Eds Schellnhuber, H.J., Cramer, W., Nakicenovic, N., Wigley, T. and Yohe, G. Cambridge University Press. (PDF version here)  This paper shows how these methods can be applied in a real problem with substantial uncertainty and important policy implications.

Rougier, J.C. (1996). Probabilistic inference for future climate using an ensemble of climate model evaluations. Climate Change, forthcoming. Currently available at Fully Bayesian treatment of model-based inference for climate, using the 'best input' approach, and exploring simple ways of structuring judgements about model imperfection.  Uses model evaluations in a simple MC integration (not recommended for large problems in practice!)


An alternative to the fully Bayesian approach, in Bayes linear methods probability distributions are not fully specified, but instead work with first and second order moments.

Craig, P.S., Goldstein, M., Rougier, J.C. and Seheult, A.H. (2001) Bayesian forecasting for complex systems using computer simulators.  Journal of the American Statistical Association, 96, 717-729.  Bayes linear prediction methodology, suitable for large problems, including diagnostics and sequential design, with a substantial example and a full statement of computations in the Appendix.

Goldstein, M. and Rougier, J.C. (2004) Probabilistic formulations for transferring inferences from mathematical models to physical systems. SIAM Journal on Scientific Computing, 26(2), 467-487.  First attempt at tackling the coherency issues in linking imperfect simulators and their underlying systems, taking account of the possibility of multiple simulators.  Distinguishes between system values and simulator inputs, clarifies the 'best input' approach, and the notions of tuning inputs, and direct simulators.

Goldstein, M. and Rougier, J.C. Bayes linear calibrated prediction for complex systems. Journal of the American Statistical Association, forthcoming.  Currently available at Restatement and clarification of the difference between the fully Bayesian and the Bayes linear approaches to prediction. Introduces the 'hat run', a way to introduce calibration into the bayes linear approach that should remain tractable in large problems. Considers diagnostics and sequential design.

Goldstein, M. and Rougier, J.C. Reified Bayesian modelling and inference for physical systems. Journal of Statistical Planning and Inference, forthcoming as a discussion paper.  Currently available at  A most detailed statement about linking simulators and systems, accounting for model imperfection and the possibility of multiple simulators.  Reified modelling provides a framework within which the system expert can structure his or her judgements about the simulator and the system in terms of natural generalisations.


There has been substantial relevant work done in other fields that it is important to know about.  The first four references below concern work on Gaussian processes from the machine learning literature.  The last reference in this section is on dimension reduction methods in environmental science.

Cornford, D., Csato, L., Evans, D.J. and Opper, M. (2004). Bayesian analysis of the scatterometer wind retrieval inverse problem: some new approaches. Journal of the Royal Statistical Society, B, 66, 609-626. (pdf here). This paper focusses on the application of the sparse sequential Gaussian process methods developed in the NCRG at Aston University to an inverse problem (or static time data assimilation problem). The sparse sequential method applies a variational Bayesian treatment to inference within Gaussian processes allowing fast approximate inference of the posterior process for arbitrary likelihoods without requiring high dimensional integrals to be evaluated. This framework seems very natural to apply to emulator settings.

Rasmussen, C.E. and Williams, C.K.I. (2006) Gaussian Processes for Machine Learning. The MIT Press, ISBN 0-262-18253-X. This book covers many of the recent developments in machine learning treatments of Gaussian processes. Of particular relevance is the chapter dealing with approximation methods for large data sets, which conveniently is online at:

Quiñonero-Candela and Rasmussen, C.E. (2005) A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6:1935-1959, 12. This gives an overview of the relationship between a variety of sparse Gaussian process methods, but iswritten for a machine learning audience.

Seeger, M. (2004) Gaussian processes for machine learning. International Journal of Neural Systems 14(2), 1-38. This paper is probably the most friendly introduction to machine learning approaches to Gaussian processes. Note that in almost all machine learning treatments of Gaussian processes the 'hyper-parameters' in the Gaussian process covariance functions, that is the length scales, the process and noise variance scales are regarded as being fixed. Another rather useful review of the state of the art (at 2004) in machine learning approaches to Gaussian processes can be found on Matthias Seeger's web page:

Von Storch, H. and Zwiers, F.W. (1999) Statistical Analysis in Climate Research, Cambridge University Press. This book is expensive, but gives a quite complete overview of dimension reduction methods in atmospheric science. Most of these methods are based around eigen-decomposition; that is, most are PCA based. In atmospheric science these are often called Empirical Orthogonal Function (EOF) analyses when the domain has a spatial extent. There is limited value in these, however where the system (or model) manifold (which will be a volume for stochastic systems/models) is well approximated by a hyper-plane then such methods can be useful. In the dynamical systems perspective a number of extensions have been developed which include the temporal dimension, these being given names such as embedding approaches, Principle Oscillation Pattern (POP), Principle Interaction Pattern (PIP) and delay vector methods.


Some of the earliest work using Gaussian processes to represent complex models was concerned with designing computer experiments.  Good designs for sets of training runs are very important for the efficient implementation of these methods.

Welch, W.J., Buck, R.J., Sacks, J., Wynn, H.P. and Mitchell, T.J. (1992) Screening, Predicting, and Computer Experiments. Technometrics, 34, 15-25. This is the follow on paper from the original DACE paper Sacks, J., Welch, W.J., Mitchell, T.J. and Wynn, H.P. (1989) Statistical Science. It is interesting in that it concentrates on screening issues, extracting the really "strong" factors. In applications it is very common to remodel just using such factors.

Sebastiani, P. and Wynn, H.P. (2000) Maximum entropy sampling and optimal Bayesian experimental design. Journal of the Royal Statistical Society. B, 62, 145-157. One of a number of papers on MES: Maximum Entropy Sampling. This is mentioned in the DACE paper and there is a small but growing literature in the area. See for example the work of Jon Lee (IBM Watson Research Centre). The method says that to minimize the expected posterior entropy (maximum information) you choose a design to maximmize the entropy in the sample.

Bates, R.A., Buck, R.J., Riccomagno, E. and Wynn, H.P. (1996) Experimental Design and Observation for Large Systems. Journal of the Royal Statistical Society, 58, 77-98. Again, more research on DACE with a nice application to circuit design. A kind of attempt to match the covariance function to the topology of the circuit. Related to partitioning problems. Is there an algebra of building covariance function using cross-products, sums etc?

Bates, R.A., Kennett, R.S., Steinberg, D.M. and Wynn, H.P. (2006) Achieving robust design from computer simulations. Quality Technology & Quantitative Management, 3, 2, 161-177. Typical paper of the wholesale application of the methods to industrial problems. Of interest is the "double response surface" method: one response for the "mean" and one to capture the sensitivity. Works well.

Chaloner, K. and Verdinelli, I. (1995) Bayesian Experimental Design : A Review, Statistical Science, 10, 3, 273-304. An exemplary Statistical Science article. An exemplary Statistical Science article. One should be aware of the subculture of Bayesian experimental design: minimize over the design the (prior, preposterior) expectation of the posterior risk. This foundational theory is very important literature going back to Savage, Lindley, Renyi et al. Note: the theory is especially straightforward in the Gaussian case, but hard in non-Gaussian cases.