The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 4, Number 2 (2010), 715-742.
Uncovering latent structure in valued graphs: A variational approach
Mahendra Mariadassou, Stéphane Robin, and Corinne Vacher
Full-text: Open access
Abstract
As more and more network-structured data sets are available, the statistical analysis of valued graphs has become common place. Looking for a latent structure is one of the many strategies used to better understand the behavior of a network. Several methods already exist for the binary case.
We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational tools allow us to achieve approximate maximum likelihood estimation of the parameters of these models. We provide a simulation study showing that our estimation method performs well over a broad range of situations. We apply this method to analyze host–parasite interaction networks in forest ecosystems.
Article information
Source
Ann. Appl. Stat. Volume 4, Number 2 (2010), 715-742.
Dates
First available in Project Euclid: 3 August 2010
Permanent link to this document
http://projecteuclid.org/euclid.aoas/1280842137
Digital Object Identifier
doi:10.1214/10-AOAS361
Mathematical Reviews number (MathSciNet)
MR2758646
Citation
Mariadassou, Mahendra; Robin, Stéphane; Vacher, Corinne. Uncovering latent structure in valued graphs: A variational approach. Ann. Appl. Stat. 4 (2010), no. 2, 715--742. doi:10.1214/10-AOAS361. http://projecteuclid.org/euclid.aoas/1280842137.
References
- Airoldi, E. M. and Carley, K. M. (2005). Sampling algorithms for pure network topologies. ACM KDD Explorations 7 13–22.
- Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
- Albert, R. and Barabási, A. L. (2002). Statistical mechanics of complex networks. Rev. Modern Phys. 74 47–97.Mathematical Reviews (MathSciNet): MR1895096
Zentralblatt MATH: 1205.82086
Digital Object Identifier: doi: 10.1103/RevModPhys.74.47 - Attias, H. (2000). A variational Bayesian framework for graphical models. In Advances in Neural Information Processing Systems 12 209–215. MIT Press, Cambridge.
- Barabási, A. L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286 509–512.Mathematical Reviews (MathSciNet): MR2091634
Digital Object Identifier: doi: 10.1126/science.286.5439.509 - Beal, M. J. and Ghahramani, Z. (2003). The variational Bayesian EM algorithm for incomplete data: With application to scoring graphical model structures. In Bayesian Statistics 7 (J. M. Bernardo et al., eds.) 543–552. Oxford Univ. Press, Oxford.Mathematical Reviews (MathSciNet): MR2003189
- Bersier, L. F. and Kehrli, P. (2008). The signature of phylogenetic constraints on food-web structure. Ecol. Complex. 5 132–139.
- Biernacki, C., Celeux, G. and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Machine Intel. 22 719–725.
- Blomberg, S. P. and Garland, T. J. (2002). Tempo and mode in evolution: Phylogenetic inertia, adaptation and comparative methods. J. Evol. Biol. 15 899–910.
- Brandle, M. and Brandl, R. (2006). Is the composition of phytophagous insects and parasitic fungi among trees predictable? Oikos 113 296–304.
- Burnham, K. P. and Anderson, R. A. (1998). Model Selection and Inference: A Practical Information-Theoretic Approach. Wiley, New York.
- Cattin, M. F., Bersier, L. F., Banasek-Richter, C. C., Baltensperger, R. and Gabriel, J. P. (2004). Phylogenetic constraints and adaptation explain food-web structure. Nature 427 835–839.
- Daudin, J.-J., Picard, F. and Robin, S. (2008). A mixture model for random graphs. Statist. Comput. 18 173–183.Mathematical Reviews (MathSciNet): MR2390817
Digital Object Identifier: doi: 10.1007/s11222-007-9046-7 - Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39 1–38.Mathematical Reviews (MathSciNet): MR501537
- Erdös, P. and Rényi, A. (1959). On random graphs, i. Publ. Math. 6 290–297.
- Fienberg, S. E. and Wasserman, S. (1981). Categorical data analysis of single sociometric relations. In Sociological Methodology 1981 156–192. Jossey-Bass, San Francisco.
- Fienberg, S. E., Meyer, M. M. and Wasserman, S. S. (1985). Statistical analysis of multiple sociometric relations. J. Amer. Statist. Assoc. 80 51–67.
- Getoor, L. and Diehl, C. P. (2004). Link mining: A survey. SIGKDD Explor. 7 3–12.
- Gilbert, G. S. and Webb, C. O. (2007). Phylogenetic signal in plant pathogen-host range. Proc. Natl. Acad. Sci. USA 104 4979–4983.
- Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99 7821–7826.Mathematical Reviews (MathSciNet): MR1908073
Zentralblatt MATH: 1032.91716
Digital Object Identifier: doi: 10.1073/pnas.122653799 - Govaert, G. and Nadif, M. (2005). An EM algorithm for the block mixture model. IEEE Trans. Pattern Anal. Machine Intel. 27 643–647.
- Hofman, J. M. and Wiggins, C. H. (2008). A Bayesian approach to network modularity. Phys. Rev. Lett. 100 258701.
- Holland, P. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33–50.Mathematical Reviews (MathSciNet): MR608176
Zentralblatt MATH: 0457.62090
Digital Object Identifier: doi: 10.1080/01621459.1981.10477598 - Ives, A. R. and Godfray, H. C. J. (2006). Phylogenetic analysis of trophic associations. Am. Nat. 16 E1–E14.
- Jaakkola, T. (2000). Tutorial on variational approximation methods. In Advanced Mean Field Methods: Theory and Practice. MIT Press, Cambridge.
- Jaccard, P. (1901). Tude comparative de la distribution florale dans une portion des alpes et des jura. Bullet. Soc. Vaud. Sci. Natur. 37 547–579.
- Jordan, M. I., Ghahramani, Z., Jaakkola, T. and Saul, L. K. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37 183–233.
- Kemp, C., Griffiths, T. H. and Tenenbaum, J. B. (2004). Discovering latent classes in relational data. Technical report, MIT Computer Science and Artificial Intelligence Laboratory.
- Leisink, M. A. R. and Kappen, H. J. (2001). A tighter bound for graphical models. Neural Comput. 13 2149–2171.
- Lorrain, F. and White, H. C. (1971). Structural equivalence of individuals in social networks. J. Math. Soc. 1 49–80.
- Mariadassou, M. (2006). Estimation paramétrique dans le modèle ERMG. Master’s thesis, Univ. Paris XI/Ecole Nationale Supèrieure.
- Mariadassou, M., Robin, S. and Vacher, C. (2010). Supplement to “Uncovering latent structure in valued graphs: A variational approach.” DOI: 10.1214/07-AOAS361SUPP.Mathematical Reviews (MathSciNet): MR2758646
Zentralblatt MATH: 1194.62125
Digital Object Identifier: doi: 10.1214/10-AOAS361
Project Euclid: euclid.aoas/1280842137 - McGrory, C. A. and Titterington, D. M. (2007). Variational approximations in Bayesian model selection for finite mixture distributions. Comput. Statist. Data Anal. 51 5352–5367.Mathematical Reviews (MathSciNet): MR2370876
- McLahan, G. and Peel, D. (2000). Finite Mixture Models. Wiley, New York.Mathematical Reviews (MathSciNet): MR1789474
- Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Phys. Rev. E 69 066133.
- Newman, M. E. J., Watts, D. J. and Strogatz, S. H. (2002). Random graph models of social networks. Proc. Natl. Acad. Sci. USA 99 2566–2572.
- Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic block-structures. J. Amer. Statist. Assoc. 96 1077–1087.Mathematical Reviews (MathSciNet): MR1947255
Zentralblatt MATH: 1072.62542
Digital Object Identifier: doi: 10.1198/016214501753208735 - Paradis, E., Claude, J. and Strimmer, K. (2004). Ape: Analyses of phylogenetics and evolution in R language. Bioinformatics 20 289–290.
- Pattison, P. E. and Robins, G. L. (2007). Probabilistic network theory. In Handbook of Probability Theory with Applications. Sage, Thousand Oaks, CA.
- Picard, F., Daudin, J.-J., Miele, V., Mariadassou, M. and Robin, S. (2007). A novel framework for random graph models with heterogeneous connectivity structure. Submitted.
- Poulin, R. (2005). Relative infection levels and taxonomic distances among the host species used by a parasite: Insights into parasite specialization. Parasitology 130 109–115.
- Rezende, E. L., Lavabre, J. E., Guimaraes, P. R., Jr., Jordano, P. and Bascompte, J. (2007). Non-random coextinctions in phylogenetically structured mutualistic networks. Nature 448 925–928.
- Ricklefs, R. E. and Miller, G. L. (2000). Community ecology. In Ecology, 4th ed. Freeman, San Francisco, CA.
- Rossberg, A. G., Ishii, R., Amemiya, T. and Itoh, K. (2006). Food webs: Experts consuming families of experts. J. Theoret. Biol. 241 552–563.Mathematical Reviews (MathSciNet): MR2254907
Digital Object Identifier: doi: 10.1016/j.jtbi.2005.12.021 - Tykiakanis, J. M., Tscharntke, T. and Lewis, O. T. (2007). Habitat modification alters the structure of tropical host-parasitoid food webs. Nature 51 202–205.
- Vacher, C., Piou, D. and Desprez-Loustau, M.-L. (2008). Architecture of an antagonistic tree/fungus network: The asymmetric influence of past evolutionary history. PLoS ONE 3 1740.
- von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Statist. 36 555–586.Mathematical Reviews (MathSciNet): MR2396807
Zentralblatt MATH: 1133.62045
Digital Object Identifier: doi: 10.1214/009053607000000640
Project Euclid: euclid.aos/1205420511 - Webb, C. O. and Donoghue, M. J. (2005). Phylomatic: Tree assembly for applied phylogenetics. Mol. Ecol. Notes 5 181–183.
- Winn, J., Bishop, C. M. and Jaakkola, T. (2005). Variational message passing. J. Mach. Learn. Res. 6 661–694.
- Xing, E., Jordan, M. and Russell, S. (2003). A generalized mean field algorithm for variational inference in exponential families. In Proceedings of the 19th Annual Conference on Uncertainty in Artificial Intelligence (UAI-03) 583–591. Morgan Kaufmann, San Francisco, CA.
- Yedidia, J. S., Freeman, W. T. and Weiss, Y. (2005). Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Inform. Theory 15 2282–2312.Mathematical Reviews (MathSciNet): MR2246363
Digital Object Identifier: doi: 10.1109/TIT.2005.850085
Supplemental materials
- Supplementary material: Interaction network between tree and fungal species. This file contains: •The adjacency matrix of interactions between tree and fungal species. •The list of the tree species. •The list of the fungal species. •The matrix of genetic distances between tree species. •The matrix of geographical distances between tree species. •The matrix of taxonomic distances between fungal species. •The matrix of nutritional type of the fungal species.

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- The random subgraph model for the analysis of an ecclesiastical network in Merovingian Gaul
Jernite, Yacine, Latouche, Pierre, Bouveyron, Charles, Rivera, Patrick, Jegou, Laurent, and Lamassé, Stéphane, The Annals of Applied Statistics, 2014 - Sparse Nonparametric Graphical Models
Lafferty, John, Liu, Han, and Wasserman, Larry, Statistical Science, 2012 - Spectral clustering and the high-dimensional stochastic blockmodel
Rohe, Karl, Chatterjee, Sourav, and Yu, Bin, The Annals of Statistics, 2011
- The random subgraph model for the analysis of an ecclesiastical network in Merovingian Gaul
Jernite, Yacine, Latouche, Pierre, Bouveyron, Charles, Rivera, Patrick, Jegou, Laurent, and Lamassé, Stéphane, The Annals of Applied Statistics, 2014 - Sparse Nonparametric Graphical Models
Lafferty, John, Liu, Han, and Wasserman, Larry, Statistical Science, 2012 - Spectral clustering and the high-dimensional stochastic blockmodel
Rohe, Karl, Chatterjee, Sourav, and Yu, Bin, The Annals of Statistics, 2011 - Model-based clustering of large networks
Vu, Duy Q., Hunter, David R., and Schweinberger, Michael, The Annals of Applied Statistics, 2013 - Inferring sparse Gaussian graphical models with latent structure
Ambroise, Christophe, Chiquet, Julien, and Matias, Catherine, Electronic Journal of Statistics, 2009 - Inference in incidence, infection, and impact: co-infection of multiple hosts by
multiple pathogens
Clark, James S. and Hersh, Michelle H., Bayesian Analysis, 2009 - Strategies for online inference of model-based clustering in large and growing networks
Zanghi, Hugo, Picard, Franck, Miele, Vincent, and Ambroise, Christophe, The Annals of Applied Statistics, 2010 - In-season prediction of batting averages: A
field test of empirical Bayes and Bayes methodologies
Brown, Lawrence D., The Annals of Applied Statistics, 2008 - Dynamical System and Nonlinear Regression for Estimate Host-Parasitoid Relationship
Miranda Cabrera, Ileana, Baños Díaz, Heyker Lellanis, and Martínez, María de los Ángeles, Journal of Applied Mathematics, 2010 - Stochastic Models of Host-Macroparasite Interaction
Isham, Valerie, The Annals of Applied Probability, 1995