Variational learning and bits-back coding: an information-theoretic view to Bayesian learning

Publisher: IEEE

Abstract:
The bits-back coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and information-theoretic minimum-description-length (MDL) learning approaches. The bits-back coding allows interpreting the cost function used in the variational Bayesian method called ensemble learning as a code length in addition to the Bayesian view of misfit of the posterior approximation and a lower bound of model evidence. Combining these two viewpoints provides interesting insights to the learning process and the functions of different parts of the model. In this paper, the problem of variational Bayesian learning of hierarchical latent variable models is used to demonstrate the benefits of the two views. The code-length interpretation provides new views to many parts of the problem such as model comparison and pruning and helps explain many phenomena occurring in learning.
Published in: IEEE Transactions on Neural Networks ( Volume: 15, Issue: 4, July 2004)
Page(s): 800 - 810
Date of Publication: 12 July 2004
ISSN Information:
PubMed ID: 15461074
Publisher: IEEE

I. Introduction

The problem of learning an optimal model for a given data set is usually divided into two subtasks: finding optimal values for parameters of a single model and finding the best model among a collection of different models. There is a large number of methods for solving the former problem ranging from ad hoc algorithms designed to mimic the assumed behavior of the human brain to others that minimize various cost functions. The second problem, model selection, is more difficult in general and it is consequently also more difficult to design good heuristics for this task.

References

References is not available for this document.