This thesis develops models and associated Bayesian inference methods for flexible univariate and multivariate conditional density estimation. The models are flexible in the sense that they can capture widely differing shapes of the data. The estimation methods are specifically designed to achieve flexibility while still avoiding overfitting. The models are flexible both for a given covariate value, but also across covariate space. A key contribution of this thesis is that it provides general approaches of density estimation with highly efficient Markov chain Monte Carlo methods. The methods are illustrated on several challenging non-linear and non-normal datasets.
In the first paper, a general model is proposed for flexibly estimating the density of a continuous response variable conditional on a possibly high-dimensional set of covariates. The model is a finite mixture of asymmetric student-t densities with covariate-dependent mixture weights. The four parameters of the components, the mean, degrees of freedom, scale and skewness, are all modeled as functions of the covariates. The second paper explores how well a smooth mixture of symmetric components can capture skewed data. Simulations and applications on real data show that including covariate-dependent skewness in the components can lead to substantially improved performance on skewed data, often using a much smaller number of components. We also introduce smooth mixtures of gamma and log-normal components to model positively-valued response variables. In the third paper, we propose a multivariate Gaussian surface regression model that combines both additive splines and interactive splines, and a highly efficient MCMC algorithm that updates all the multi-dimensional knot locations jointly. We use shrinkage priors to avoid overfitting with different estimated shrinkage factors for the additive and surface part of the model, and also different shrinkage parameters for the different response variables. In the last paper, we present a general Bayesian approach for directly modeling dependencies between variables as a function of explanatory variables in a flexible copula context. In particular, the Joe-Clayton copula is extended to have covariate-dependent tail dependence and correlations. The posterior inference is carried out using a novel and efficient simulation method. The appendix of the thesis documents the computational implementation details.
List of papers
- Flexible Modeling of Conditional Distributions using Smooth Mixtures of Asymmetric Student T Densities In: Journal of Statistical Planning and Inference, Vol. 140, no 12, p. 3638-3654.
- Modeling Conditional Densities using Finite Smooth Mixtures In: Mixtures: Estimation and Applications / [ed] Kerrie L. Mengersen, Christian P. Robert, D. Michael Titterington, Chichester: John Wiley & Sons, 2011, p. 123-144.
- Efficient Bayesian Multivariate Surface Regression In: Scandinavian Journal of Statistics, Vol. 40, no 4, p. 706-723.
- Modeling Covariate-Contingent Correlation and Tail-Dependence with Copulas arXiv:1401.0100v1
About a month ago, Professor Martin Sköld wrote to me the Cramér Society decided to award my thesis “Bayesian Modeling of Conditional Densities” the 2014 Cramér prize. The prize is awarded each year to an outstanding PhD thesis in Statistics/Mathematical Statistics. As a winner I will receive a cash prize and am expected to present the work at the Cramér Society annual meeting on March 20 in Stockholm.
I am humbled to have been selected for this award. Thanks should go to all the people at Cramér Society for reviewing my thesis and thank all the people including my former supervisor Professor Mattias Villani who helped me so much during my PhD studies. Unfortunately, I was too busy to fly to Stockholm during that week. So I cast a video presentation for the The Cramér Society annual meeting. This video is now available on YouTube or you can download it from this link.
Read more from my home university.