The Annals of Statistics
- Ann. Statist.
- Volume 38, Number 3 (2010), 1287-1319.
High-dimensional Ising model selection using ℓ1-regularized logistic regression
Pradeep Ravikumar, Martin J. Wainwright, and John D. Lafferty
Full-text: Open access
Abstract
We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n=Ω(d3log p) with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n=Ω(d2log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.
Article information
Source
Ann. Statist. Volume 38, Number 3 (2010), 1287-1319.
Dates
First available in Project Euclid: 8 March 2010
Permanent link to this document
http://projecteuclid.org/euclid.aos/1268056617
Digital Object Identifier
doi:10.1214/09-AOS691
Zentralblatt MATH identifier
05712422
Mathematical Reviews number (MathSciNet)
MR2662343
Subjects
Primary: 62F12: Asymptotic properties of estimators
Secondary: 68T99: None of the above, but in this section
Keywords
Graphical models Markov random fields structure learning ℓ_1-regularization model selection convex risk minimization high-dimensional asymptotics
Citation
Ravikumar, Pradeep; Wainwright, Martin J.; Lafferty, John D. High-dimensional Ising model selection using ℓ 1 -regularized logistic regression. Ann. Statist. 38 (2010), no. 3, 1287--1319. doi:10.1214/09-AOS691. http://projecteuclid.org/euclid.aos/1268056617.
References
- [1] Abbeel, P., Koller, D. and Ng, A. Y. (2006). Learning factor graphs in polynomial time and sample complexity. J. Mach. Learn. Res. 7 1743–1788.Mathematical Reviews (MathSciNet): MR2274423
- [2] Banerjee, O., Ghaoui, L. E. and d’Asprémont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.Mathematical Reviews (MathSciNet): MR2417243
- [3] Bertsekas, D. (1995). Nonlinear Programming. Athena Scientific, Belmont, MA.
- [4] Bresler, G., Mossel, E. and Sly, A. (2009). Reconstruction of Markov random fields from samples: Some easy observations and algorithms. Available at http://front.math.ucdavis.edu/0712.1402.
- [5] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313–2351.Mathematical Reviews (MathSciNet): MR2382644
Zentralblatt MATH: 1139.62019
Digital Object Identifier: doi: 10.1214/009053606000001523
Project Euclid: euclid.aos/1201012958 - [6] Chickering, D. (1995). Learning Bayesian networks is NP-complete. In Learning from Data: Artificial Intelligence and Statistics V (D. Fisher and H. Lenz, eds.). Lecture Notes in Statistics 112 121–130. Springer, New York.Mathematical Reviews (MathSciNet): MR1473013
- [7] Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory 14 462–467.
- [8] Cross, G. and Jain, A. (1983). Markov random field texture models. IEEE Trans. PAMI 5 25–39.
- [9] Csiszár, I. and Talata, Z. (2006). Consistent estimation of the basic neighborhood structure of Markov random fields. Ann. Statist. 34 123–145.
- [10] Dasgupta, S. (1999). Learning polytrees. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI-99). Morgan Kaufmann, San Francisco, CA.
- [11] Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices, and Banach spaces. In Handbook of the Geometry of Banach Spaces 1 317–336. Elsevier, Amsterdam.Mathematical Reviews (MathSciNet): MR1863696
Zentralblatt MATH: 1067.46008
Digital Object Identifier: doi: 10.1016/S1874-5849(01)80010-3 - [12] Donoho, D. and Elad, M. (2003). Maximal sparsity representation via ℓ1 minimization. Proc. Natl. Acad. Sci. USA 100 2197–2202.Mathematical Reviews (MathSciNet): MR1963681
Zentralblatt MATH: 1064.94011
Digital Object Identifier: doi: 10.1073/pnas.0437847100 - [13] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. PAMI 6 721–741.
- [14] Hassner, M. and Sklansky, J. (1980). The use of Markov random fields as models of texture. Comp. Graphics Image Proc. 12 357–370.
- [15] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.Mathematical Reviews (MathSciNet): MR144363
Zentralblatt MATH: 0127.10602
Digital Object Identifier: doi: 10.2307/2282952
JSTOR: links.jstor.org - [16] Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge Univ. Press, Cambridge.Mathematical Reviews (MathSciNet): MR832183
- [17] Ising, E. (1925). Beitrag zur theorie der ferromagnetismus. Zeitschrift für Physik 31 253–258.
- [18] Kalisch, M. and Buhlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the pc-algorithm. J. Mach. Learn. Res. 8 613–636.
- [19] Kim, Y., Kim, J. and Kim, Y. (2005). Blockwise sparse regression. Statist. Sinica 16 375–390.
- [20] Koh, K., Kim, S. J. and Boyd, S. (2007). An interior-point method for large-scale ℓ1-regularized logistic regression. J. Mach. Learn. Res. 3 1519–1555.
- [21] Manning, C. D. and Schutze, H. (1999). Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
- [22] Meier, L., van de Geer, S. and Bühlmann, P. (2007). The group lasso for logistic regression. Technical report, Mathematics Dept., Swiss Federal Institute of Technology Zürich.
- [23] Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
- [24] Ng, A. Y. (2004). Feature selection, ℓ1 vs. ℓ2 regularization, and rotational invariance. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML-04). Morgan Kaufmann, San Francisco, CA.
- [25] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2008). Union support recovery in high-dimensional multivariate regression. Technical report, Dept. Statistics, Univ. California, Berkeley.
- [26] Ripley, B. D. (1981). Spatial Statistics. Wiley, New York.Mathematical Reviews (MathSciNet): MR624436
- [27] Rockafellar, G. (1970). Convex Analysis. Princeton Univ. Press, Princeton.Mathematical Reviews (MathSciNet): MR274683
- [28] Rothman, A., Bickel, P., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.Mathematical Reviews (MathSciNet): MR2417391
Digital Object Identifier: doi: 10.1214/08-EJS176
Project Euclid: euclid.ejs/1214491853 - [29] Santhanam, N. P. and Wainwright, M. J. (2008). Information-theoretic limits of high-dimensional graphical model selection. In International Symposium on Information Theory. Toronto, Canada.
- [30] Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction and Search. MIT Press, Cambridge, MA.Mathematical Reviews (MathSciNet): MR1815675
- [31] Srebro, N. (2003). Maximum likelihood bounded tree-width Markov networks. Artificial Intelligence 143 123–138.Mathematical Reviews (MathSciNet): MR1963987
Zentralblatt MATH: 1011.68066
Digital Object Identifier: doi: 10.1016/S0004-3702(02)00360-0 - [32] Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals. IEEE Trans. Inform. Theory 51 1030–1051.
- [33] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
- [34] Wainwright, M. J. and Jordan, M. I. (2003). Graphical models, exponential families, and variational inference. Technical Report 649, Dept. Statistics, Univ. California, Berkeley.
- [35] Wainwright, M. J., Ravikumar, P. and Lafferty, J. D. (2007). High-dimensional graphical model selection using ℓ1-regularized logistic regression. In Advances in Neural Information Processing Systems (B. Schölkopf, J. Platt and T. Hoffman, eds.) 19 1465–1472. MIT Press, Cambridge, MA.Mathematical Reviews (MathSciNet): MR2441316
- [36] Welsh, D. J. A. (1993). Complexity: Knots, Colourings, and Counting. Cambridge Univ. Press, Cambridge.
- [37] Woods, J. (1978). Markov image modeling. IEEE Trans. Automat. Control 23 846–850.
- [38] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.Mathematical Reviews (MathSciNet): MR2212574
Zentralblatt MATH: 1141.62030
Digital Object Identifier: doi: 10.1111/j.1467-9868.2005.00532.x - [39] Zhao, P. and Yu, B. (2007). On model selection consistency of lasso. J. Mach. Learn. Res. 7 2541–2567.Mathematical Reviews (MathSciNet): MR2274449

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- High-dimensional structure estimation in Ising models: Local separation criterion
Anandkumar, Animashree, Tan, Vincent Y. F., Huang, Furong, and Willsky, Alan S., The Annals of Statistics, 2012 - Honest variable selection in linear and logistic regression models via ℓ1 and ℓ1+ℓ2 penalization
Bunea, Florentina, Electronic Journal of Statistics, 2008 - ℓ1-penalized quantile regression in high-dimensional sparse models
Belloni, Alexandre and Chernozhukov, Victor, The Annals of Statistics, 2011
- High-dimensional structure estimation in Ising models: Local separation criterion
Anandkumar, Animashree, Tan, Vincent Y. F., Huang, Furong, and Willsky, Alan S., The Annals of Statistics, 2012 - Honest variable selection in linear and logistic regression models via ℓ1 and ℓ1+ℓ2 penalization
Bunea, Florentina, Electronic Journal of Statistics, 2008 - ℓ1-penalized quantile regression in high-dimensional sparse models
Belloni, Alexandre and Chernozhukov, Victor, The Annals of Statistics, 2011 - High-dimensional graphs and variable selection with the Lasso
Meinshausen, Nicolai and Bühlmann, Peter, The Annals of Statistics, 2006 - Sign-constrained least squares estimation for high-dimensional regression
Meinshausen, Nicolai, Electronic Journal of Statistics, 2013 - Maximum likelihood estimation in logistic regression models with a diverging number of covariates
Liang, Hua and Du, Pang, Electronic Journal of Statistics, 2012 - High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence
Ravikumar, Pradeep, Wainwright, Martin J., Raskutti, Garvesh, and Yu, Bin, Electronic Journal of Statistics, 2011 - Consistent group selection in high-dimensional linear regression
Wei, Fengrong and Huang, Jian, Bernoulli, 2010 - Learning loopy graphical models with latent variables: Efficient methods and guarantees
Anandkumar, Animashree and Valluvan, Ragupathyraj, The Annals of Statistics, 2013 - Multivariate Bernoulli distribution
Dai, Bin, Ding, Shilin, and Wahba, Grace, Bernoulli, 2013