- Open Access
Network Reconstruction via the Minimum Description Length Principle
Phys. Rev. X 15, 011065 – Published 20 March, 2025
Abstract
A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting and produces an inferred network with a statistically justifiable number of edges and their weight distribution. The status quo in this context is based on
Physics Subject Headings (PhySH)
Popular Summary
Understanding complex systems—ranging from gene regulation to financial markets—requires uncovering the hidden networks of interactions between their individual components. However, these interactions are often difficult or impossible to measure directly, meaning they must be inferred from observable behavior. A key challenge in this process is determining the number and strength of these connections without introducing artificial links that distort the true structure of the network. This study presents a new approach to solving this problem by applying principles from information theory and Bayesian statistics.
To achieve this, I rely on Occam’s razor, a principle favoring the simplest explanation that sufficiently describes the data. Specifically, a method is developed that constructs the most efficient network representation by minimizing the amount of information needed to describe both the network structure and the observed data. This approach ensures that only statistically significant connections are included, avoiding overfitting and producing a more accurate model of the system’s underlying interactions. This method is not only more precise but also computationally efficient compared to existing techniques.
By enabling the reliable reconstruction of large networks from limited or indirect data, my approach has broad applications in various scientific fields. Future work could explore refining the technique to handle even more complex datasets and applying it to real-world problems in biology, neuroscience, and economics, where understanding hidden network structures is crucial for advancing research and technology.
Article Text
Network reconstruction is the task of determining the unseen interactions between elements of a complex system when only data on their behavior are available—usually consisting of time-series or independent samples of their states. This task is required when it is either too difficult, costly, or simply impossible to determine the individual pairwise interactions via direct measurements. This is a common situation when we want to determine the associations between species only from their co-occurrence in population samples , the financial market dependencies from the fluctuation of stock prices , protein structures from observed amino-acid contact maps , gene regulatory networks from expression data , neural connectivity from functional magnetic resonance imaging (fMRI), electroencephalography (EEG), micro electrode array (MEA), or calcium fluorescence imaging data , and epidemic contact tracing from infections , among many other scenarios.
In this work we frame network reconstruction as a statistical inference problem , where we assume that the observed data have been sampled from a generative statistical model that has a weighted network as part of its set of parameters. In this case, the reconstruction task consists of estimating these parameters from data. This generative framing contrasts with other inferential approaches based on, e.g., Granger causality and transfer entropy in important ways: (1) It allows for a principled framework that does not rely on ad hoc and often arbitrary confidence thresholds, (2) can reliably distinguish between direct and indirect interactions, to the extent that this information is obtainable from the data, (3) enables uncertainty quantification and comparison between alternative generative hypotheses, and (4) produces a generative model as a final result, which can be used not only for interpretation, but also to predict the outcomes of interventions and alternative conditions previously unseen in the data. These same features make this type of approach preferable also to noninferential heuristics such as correlation thresholding , known to suffer from substantial biases .
A central obstacle to the effective implementation of inferential network reconstruction is the need for statistical regularization: A network with
Perhaps the most common strategy to avoid overfitting is
The
Other regularization approaches also have been proposed for network reconstruction, such as
To the best of our knowledge, there is currently no regularization scheme that is simultaneously (1) principled, (2) effective at preventing overfitting, (3) algorithmically efficient, and (4) nonparametric, in particular without requiring the network sparsity to be known in advance.
In this work we fill this gap by developing a Bayesian regularization scheme that follows the minimum description length (MDL) principle and seeks to find the optimal sparsity and weight distribution of the edges in a manner that most compresses the observed data. Instead of weight shrinkage, our approach relies on weight quantization as a means of quantifying the information necessary to encode the inferred weights to a sufficient numerical precision, and also makes no strong assumption on the shape of the weight distribution, unlike
This paper is divided as follows. In Sec. we describe our inferential framework, and in Sec. we discuss the problems with
The general scenario for inferential network reconstruction consists of some data
where
with
given some initial state
that we need to be able to compute up to a normalization constant. Our overall approach is designed to work independently of what kind of generative model is used, static or dynamic, with any type of likelihood, as long as it is conditioned on edge weights according to the above equation. In Appendix we list some generative models that we will use as examples in this work, but for now it is simpler if we consider the generative model more abstractly.
Given Eq. we can proceed in a variety of ways, for example, by sampling from it or computing expectations. In this work, we are interested in the maximum a posteriori (MAP) point estimate,
corresponding to the most likely weighted network that generated the data. In this setting, regularization is implemented by choosing a suitable prior
In the following we will discuss
The most common choice to regularize Eq. is independent Laplace distributions,
which result in a
This choice offers some attractive properties. First, the function
However, this choice also suffers from important shortcomings. From a modeling perspective, the prior of Eq. corresponds to a maximum entropy distribution conditioned on the expected mean weight magnitude, including the zeros. This amounts to a prior that is conditioned on previous knowledge about the density of the network, encoded via the parameter
The most common solution employed in practice is to select
and using the held-out data to compute the log-likelihood
This standard technique is designed to prevent overfitting, since it relies on the predictive performance of the inferred model
For our analysis it will be useful to evaluate the accuracy of a reconstruction
and likewise for the binarized adjacency matrix,
In Fig. we consider the reconstruction of Zachary’s karate club network , with
It is clear from this example that the
Ideally, we should be able to determine which edges are better modeled by setting them to zero, without inherently reducing the magnitude of the nonzero ones. This is an inference problem on its own, and instead of committing beforehand to which kind of weight distribution is more appropriate, a more robust approach consists in formulating an open-ended class of distributions, and learning its precise shape a posteriori from the data, following the principle of maximum parsimony. This is precisely what we achieve in the next section.
Our purpose is to implement a principled regularization scheme that does not rely on weight shrinkage, promotes sparsity, and obviates the need for cross-validation. Ideally, we would like a choice of prior
More precisely, we want to evoke the inherent connection between Bayesian inference and data compression by interpreting the joint likelihood
where the first term in the right-hand side of the last equation corresponds to the amount of information required (e.g., in bits if base 2 is used for the logarithm) to encode the data
However, the above equivalence with compression is not valid when the weights
With the objective of precisely quantifying the model complexity, we proceed in several steps. First, we account for the sparsity of the model by introducing an auxiliary variable
and the value of
where
where
We can then proceed by choosing a continuous distribution for the auxiliary values
in which case we obtain a probability mass,
where we have excluded the value
where
with
assuming a uniform prior density
Finally, we need a probability mass for the weight categories
with
Lastly, we need a prior over the number of weight categories
where the remaining quantities
The model above is conditioned on two hyperparameters,
We observe that we have
Differently from the Laplace prior of Eq. , Eq. implements a nonparametric regularization of the weights—not because it lacks parameters, but because the shape of the distribution and number of parameters adapt to what can be statistically justified from the data. In particular, the number and position of the weight categories, as well as the number of edges, will grow or shrink depending on whether they can be used to compress the data, and provide a sufficiently parsimonious network reconstruction, without overfitting.
The prior of Eq. incorporates the properties we initially desired; i.e., (1) it penalizes model complexity according to the description length, (2) it exploits and promotes sparsity, and (3) it does not rely on weight shrinkage. However, these improvements come at a cost: The prior is no longer a convex function on
Whenever Eq. corresponds to a convex optimization objective, it can be performed with relatively straighforward algorithms. The simplest one is coordinate descent , in which every entry in the
Although these algorithms cannot be used unmodified to our problem at hand, due to its nonconvexity and the presence of additional latent variables, we can use them as building blocks. We will consider different types of updates, according to the different types of latent variable we are considering, given our initial
Edge updates. We use the algorithm of Ref. to find the
where
The algorithm of Ref. can obtain this set in typical time
-
(1)
chosen according to a random bisection search maximizing the objective function, with each midpoint sampled uniformly at a random.𝑊 ′ 𝑖 𝑗 ∈ 𝒛 -
(2)
and𝑊 ′ 𝑖 𝑗 ∈ ℝ chosen according to a random bisection search in the interval of weights allowed for the problem. This will introduce a new weight category with𝑊 ′ 𝑖 𝑗 ∉ 𝒛 .𝒛 ′ = 𝒛 ∪ { 𝑊 ′ 𝑖 𝑗 } -
(3)
, if𝑊 ′ 𝑖 𝑗 ← 0 .𝑊 𝑖 𝑗 > 0
Edge moves. Given the same set
Edge swaps. We chose uniformly at random two nonzero entries in
Weight category values. We iterate over each weight category
Weight category distribution. We consider the distribution of weight categories across the nonzero entries of
-
(1) Merge. Two categories
and𝑧 𝑘 are removed and merged as a single new category𝑧 𝑙 , with a value chosen with a bisection search optimizing the objective function. The number of categories reduces by one.𝑧 𝑚 -
(2) Split. A single category
is split in two new categories𝑧 𝑚 and𝑧 𝑘 , chosen initially uniformly random in the range𝑧 𝑙 . The entries are repeatedly moved between these categories if it improves the quality function, and the category values[ 𝑧 𝑚 − 1 , 𝑧 𝑚 + 1 ] and𝑧 𝑘 are updated according to bisection search, until convergence. The number of categories increases by one.𝑧 𝑙 -
(3) Merge-split. The two steps above are combined in succession, so that the entries are redistributed in two potentially new categories, and while their total number remains the same.
The workhorse of this overall procedure is the algorithm of Ref. which reduces the search for candidate updates to a subquadratic time. The remaining steps of the algorithm operate on this set with linear complexity, and therefore a single “sweep” of all latent variables can be accomplished under the same algorithmic complexity as Ref. , since it dominates. In practice, due to the extra latent variables and optimization steps, this algorithm finishes in time around 5–10 times longer than when using In Fig. we demonstrate the use of our algorithm on synthetic data generated from empirical networks. We simulate the kinetic and equilibrium Ising models (see Appendix ) with true weights sampled i.i.d. from a normal distribution with mean We observe that our MDL regularization performs very closely to the true prior, except for very few data, in which case the MDL tends to produce an empty network, whereas the true prior in fact overfits and yields a network with a number of edges larger than the true value. As we already explained, MAP estimation does not guarantee proper regularization when the priors correspond to probability densities—even if it happens to be the correct one. The overall tendency of the MDL approach is, by design, to err toward simpler models, instead of more complex ones, when the data are insufficient to provide a good accuracy. Inference results for the kinetic and equilibrium Ising model for two empirical networks, as indicated in the legend, with true weights sampled as described in the text, using both the true prior, our MDL regularization, as well as As already discussed previously in Sec. , The use of the auxiliary sparse network where where the marginal distribution As discussed in Ref. , the use of this kind of structured prior can improve regularization, since it provides more opportunities to reduce the description length, whenever the underlying network is not completely random. The obtained community structure is often also useful for downstream analysis and interpretation. Algorithmically, this extension of our method is straightforward, as we need only to include merge-split partition moves , as done to infer the SBM. We refer the reader to Ref. for more details about the SBM formulation, and to Ref. for its effect in network reconstruction. Besides the weighted adjacency matrix In principle, since the size of Proceeding as before gives us where Now we perform a comparative evaluation of our MDL regularization for empirical data where true network is not available. We will consider the votes of all In Fig. we show the result of the reconstruction using both our MDL scheme and Reconstructed networks of interactions between The interpretation of the two inferred models is fairly different. Although they share similar large-scale structures (as shown by the node colors indicating the inferred communities), the one inferred by We take the opportunity to compare with another regularization scheme called “decimation,” proposed in Ref. . That approach is not based on shrinkage, and instead proceeds by first performing an unregularized maximum likelihood, resulting in a full network with no zero weights, then forcing a small number of the weights with smallest magnitude to zero, and proceeding recursively on the remaining entries in the same manner, until sufficiently many entries have been set to zero. Besides the increased computational cost of inferring the weights multiple times, and starting with a full network—thus imposing an overall complexity of at least Decimation procedure employed on the same data as Fig. . The bottom panel shows the maximum likelihood growing monotonically as a function of the number of nonzero edges considered during decimation, and the top panel shows the similarity (weighted and binarized) with the network inferred via MDL. The vertical dashed line corresponds to the value
In this section we employ our regularization method for the analysis of microbial interactions . We consider data from the human microbiome project (HMP) and the earth microbiome project (EMP) , which assemble thousands of samples of microbial species abundances, collected, respectively, in various sites in the human body and in diverse habitats across the globe. More precisely, these datasets contain abundances derived from DNA sequencing, aggregated as operational taxonomic units (OTUs)—groups of individuals closely related according to their gene sequence, serving as a pragmatic proxy for individual species. The HMP dataset contains
The objective of our analysis is to obtain the underlying network of interactions between species from their co-occurrence in samples. To account for issues related with compositionality, i.e., the fact that the gene sequencing techniques used in these studies can provide only the relative abundance between species, instead of their absolute number, we will discard the magnitude of the read counts, and consider only the presence or absence of species, i.e., if more than zero counts have been observed for a given OTU, resulting in binary observations. We will also not attempt to model the dependence with environmental conditions and focus instead only on the mutual iterations between species. These are both substantial omissions, but our purpose is not to fully saturate the modeling requirements of this problem, but instead to demonstrate how our regularization approach can be coupled with a particular generative model to tackle a network reconstruction problem of this nature and magnitude (to the best of our knowledge, with the exception of Ref. , the reconstruction based on the full HMP and EMP datasets has not been performed so far).
We will model a given sample
where
We are interested in the MAP estimation,
using the MDL regularization for both
In Fig. we show the results of the reconstruction for both datasets. The inferred networks are strongly modular, composed of large node groups, connected between themselves largely by negative interactions, and a hierarchical subdivision into smaller groups (see also Fig. ), connected between themselves by predominately positive weights. As the middle panel of Fig. shows, the large-scale modular structure is strongly associated with the different body sites for the HMP network and the different habitats for EMP. This association makes it easier to interpret the negative edges between the larger groups: The different regions represent distinct microbiota, composed of specialized organisms that occupy specific ecological niches. Since we have not included dependence on environmental conditions into our model, this mutual exclusion gets encoded by negative couplings. We also observe that nodes with low degree tend to occur more seldomly (negative
Reconstructed networks for the (a) human microbiome project (HMP) with
Hierarchical community structure obtained with the nested SBM integrated into the MDL regularization, for (a) the human microbiome project and (b) the earth microbiome project, corresponding to the same networks shown in Fig. . Each filled polygon represents the convex hull of the nodes that belong to the same group. Nested polygons represent nested hierarchical levels.
Strikingly, as we show in Appendix , the groups found with the SBM that is part of our MDL regularization correlate significantly with the known taxonomic divisions of the microbial species considered, where most of the groups found contain predominately a single genus, although the same genus can be split into many inferred groups. This indicates different scales of similarity and dissimilarity between taxonomically proximal species when it comes to their interaction patterns with other species. In this section we explore the fact that our reconstructed networks equip us with more than just a structural representation of the interactions between species, but it also provides a generative model that can be used to answer questions that cannot be obtained directly from the data. One particular question that is of biological significance is the extent to which an ecological system is stable to perturbations, and what are the outcomes of particular interventions. One important concept for ecology is that of “keystone” species, defined as those that have a disproportionally strong impact in its environment relative to its abundance . In the context of microbial systems, keystone species are often associated with the impact they have once they are forcibly removed from the system, measured by the number of species that disappear as a consequence . The removal of individual species can be investigated with the Ising model in the following way. First we observe that since the network structure is modular, and the weights with The individual macrostates We can then characterize the stability of the system precisely as its tendency to escape from one such macrostate after a small portion of the species is perturbed [see Fig. ]. In the case of a single species, we can quantify this by comparing the marginal expectation of the presence of a species with the expectations obtained when another species where in the last equation the distribution should be interpreted as the outcome of the dynamics if we initialize it with a configuration from macrostate We can then quantify the magnitude of the perturbation of node If the perturbation is insufficient to dislodge the system to another macrostate, we will typically have In Fig. we show the distribution of In Fig. we highlight two perturbations with large Examples of single-species perturbations that cause the system to move from a macrostate These predicted outcomes for the HMP data have only a tentative character, since, as we mentioned before, we are omitting important aspects from the model—such as the abundance magnitudes and environmental factors —and making a strong modeling assumption with the Ising model. Nevertheless, this example already illustrates how even an elementary modeling assumption is sufficient to give us access to functional properties of a latent system, inferred from empirical data, which comes as a result of nonlinear emergent behavior mediated by the interactions between the species. When complemented with more realistic generative models, this kind of approach could be used to predict empirical outcomes, and be validated with experiments or additional observational data. Toward this goal, appropriate regularization is crucial, since a model that overfits will deliver neither meaningful interpretations nor accurate predictions.
We have presented a principled nonparametric regularization scheme based on weight quantization and the MDL principle that can be used to perform network reconstruction via statistical inference in a manner that adapts to the statistical evidence available in the data. Our approach inherently produces the simplest (i.e., sparsest) network whenever the data cannot justify a denser one, and hence does not require the most appropriate number of edges to be known beforehand, and instead produces this central information as an output, as is often desired. Our approach does not rely on any specific property of the generative model being used, other than it being conditioned on a weighted adjacency matrix, and therefore it is applicable for a broad range of reconstruction problems.
Unlike
When combined with the subquadratic algorithm of Ref. , based on a stochastic second-neighbor search, our regularization scheme can be used to reconstruct networks with a large number of nodes, unlike most methods in current use, which have an algorithmic complexity that is at best quadratic on the number of nodes, typically even higher.
Our structured priors are extensible, and when coupled with inferential community detection can also use latent modular network structure to improve the regularization, and as a consequence also the final reconstruction accuracy .
As we have demonstrated with an empirical case study on microbial interactions, our method is effective at uncovering functional latent large-scale complex systems from empirical data, which can then be used to make predictions about future behavior, the effect of interventions, and the presence of tipping points.
This work has been funded by the Vienna Science and Technology Fund (WWTF) and by the State of Lower Austria [Grant ID: 10.47379/ESS22032]. It is also in part the result of research conducted for Central European University, Private University. It was made possible by the CEU Open Access Fund.
In our examples we use two generative models: the equilibrium Ising model and the kinetic Ising model. The equilibrium Ising model is a distribution on
with
as it gives asymptotically correct results and has excellent performance in practice .
The kinetic Ising model is a Markov chain with transition probabilities conditioned on the same parameters as before, given by
In the case of the zero-valued Ising model with
In Fig. we show the taxonomic classification of the species for the reconstructed networks of Fig. . The taxonomy consists of the phylum, class, order, family, and genus of each OTU. Each taxonomic category is shown as node colors, as indicated in the legend. In both cases we observe that most groups inferred with the SBM incorporated in our regularization scheme are compatible with the taxonomic division, since they tend to contain one dominating taxonomic category, at all levels—although a single category is often split into many groups, indicating structures that are not captured solely by taxonomy.
References (88)
- K. Guseva, S. Darcy, E. Simon, L. V. Alteio, A. Montesinos-Navarro, and C. Kaiser, From diversity to complexity: Microbial networks in soils, Soil biology and biochemistry 169, 108604 (2022).
- T. Bury, A statistical physics perspective on criticality in financial markets, J. Stat. Mech. (2013) P11004.
- M. Weigt, R. A. White, H. Szurmant, J. A. Hoch, and T. Hwa, Identification of direct residue contacts in protein–protein interaction by message passing, Proc. Natl. Acad. Sci. U.S.A. 106, 67 (2009).
- S. Cocco, C. Feinauer, M. Figliuzzi, R. Monasson, and M. Weigt, Inverse statistical physics of protein sequences: A key issues review, Rep. Prog. Phys. 81, 032601 (2018).
- P. D’haeseleer, S. Liang, and R. Somogyi, Genetic network inference: From co-expression clustering to reverse engineering, Bioinformatics 16, 707 (2000).
- G. Stolovitzky, D. Monroe, and A. Califano, Dialogue on reverse-engineering assessment and methods, Ann. N.Y. Acad. Sci. 1115, 1 (2007).
- V. Vinciotti, E. C. Wit, R. Jansen, E. J. C. N. de Geus, B. W. J. H. Penninx, D. I. Boomsma, and P. A. C. ’t Hoen, Consistency of biological networks inferred from microarray and sequencing data, BMC Bioinf. 17, 254 (2016).
- A. Maccione, M. Gandolfo, M. Tedesco, T. Nieus, K. Imfeld, S. Martinoia, and B. Luca, Experimental investigation on spontaneously active hippocampal cultures recorded by means of high-density MEAs: Analysis of the spatial resolution effects, Front. Neuroeng. 3, 1294 (2010).
- J. R. Manning, R. Ranganath, K. A. Norman, and D. M. Blei, Topographic factor analysis: A Bayesian model for inferring brain networks from neural fata, PLoS One 9, e94914 (2014).
- H. F. Po, A. M. Houben, A.-C. Haeb, D. R. Jenkins, E. J. Hill, H. R. Parri, J. Soriano, and D. Saad, Inferring structure of cortical neuronal networks from firing data: A statistical physics approach, arXiv:2402.18788.
- Braunstein Alfredo, Ingrosso Alessandro, and Muntoni Anna Paola, Network reconstruction from infection cascades, J. R. Soc. Interface 16, 20180844 (2019).
- L. Peel, T. P. Peixoto, and M. De Domenico, Statistical inference links data and theory in network science, Nat. Commun. 13, 6794 (2022).
- J. Sun, D. Taylor, and E. M. Bollt, Causal network inference by optimal causation entropy, SIAM J. Appl. Dyn. Syst. 14, 73 (2015).
- J. G. Orlandi, O. Stetter, J. Soriano, T. Geisel, and D. Battaglia, Transfer entropy reconstruction and labeling of neuronal connections from simulated calcium imaging, PLoS One 9, e98842 (2014).
- E. Bullmore and O. Sporns, Complex brain networks: Graph theoretical analysis of structural and functional systems, Nat. Rev. Neurosci. 10, 186 (2009).
- V. L. Zogopoulos, G. Saxami, A. Malatras, K. Papadopoulos, I. Tsotra, V. A. Iconomidou, and I. Michalopoulos, Approaches in gene coexpression analysis in eukaryotes, Biology 11, 1019 (2022).
- G. T. Cantwell, Y. Liu, B. F. Maier, A. C. Schwarze, C. A. Serván, J. Snyder, and G. St-Onge, Thresholding normally distributed data creates complex networks, Phys. Rev. E 101, 062302 (2020).
- T. Hastie, R. Tibshirani, and M. Wainwright, Statistical Learning with Sparsity: The Lasso and Generalizations (CRC Press, Boca Raton, FL, 2015).
- R. Tibshirani, Regression shrinkage and selection via the Lasso, J. R. Stat. Soc. Ser. B 58, 267 (1996).
- N. Meinshausen and P. Bühlmann, High-dimensional graphs and variable selection with the Lasso, Ann. Stat. 34, 1436 (2006).
- M. Yuan and Y. Lin, Model selection and estimation in the Gaussian graphical model, Biometrika 94, 19 (2007).
- O. Banerjee, L. E. Ghaoui, and A. d’Aspremont, Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data, J. Mach. Learn. Res. 9, 485 (2008).
- L. Augugliaro, A. Abbruzzo, and V. Vinciotti,
-penalized censored Gaussian graphical model, Biostatistics 21, e1 (2020). - J. Friedman, T. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graphical Lasso, Biostatistics 9, 432 (2008).
- G. Bresler, E. Mossel, and A. Sly, Reconstruction of Markov random fields from samples: Some observations and allgorithms, Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, Lecture Notes in Computer Science (Springer, Berlin, 2008), pp. 343–356.
- G. Bresler, E. Mossel, and A. Sly, Reconstruction of Markov random fields from samples: Some easy observations and algorithms, arXiv:0712.1402.
- P. Ravikumar, M. J. Wainwright, and J. D. Lafferty, High-dimensional Ising model selection using
-regularized logistic regression, Ann. Stat. 38, 1287 (2010). - G. Bresler, Efficiently learning Ising models on arbitrary graphs, in Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing (STOC ’15) (Association for Computing Machinery, New York, 2015), pp. 771–782.
- H. C. Nguyen, R. Zecchina, and J. Berg, Inverse statistical problems: From the inverse Ising problem to data science, Adv. Phys. 66, 197 (2017).
- M. Vuffray, S. Misra, A. Lokhov, and M. Chertkov, Interaction screening: Efficient and sample-optimal learning of Ising models, Advances in Neural Information Processing Systems, Vol. 29 (Curran Associates, Inc., Red Hook, NY, 2016).
- A. Y. Lokhov, M. Vuffray, S. Misra, and M. Chertkov, Optimal structure and parameter learning of Ising models, Sci. Adv. 4, e1700791 (2018).
- A. J. Rothman, P. J. Bickel, E. Levina, and J. Zhu, Sparse permutation invariant covariance estimation, Electron. J. Stat. 2, 494 (2008).
- H. Liu, K. Roeder, and L. Wasserman, Stability approach to regularization selection (StARS) for high dimensional graphical models, in Proceedings of the 23rd International Conference on Neural Information Processing Systems (NIPS’10), Vol. 2 (Curran Associates Inc., Red Hook, NY, 2010), pp. 1432–1440.
- E. Aurell and M. Ekeberg, Inverse Ising inference using all the data, Phys. Rev. Lett. 108, 090201 (2012).
- A. Decelle and F. Ricci-Tersenghi, Pseudolikelihood decimation algorithm improving the inference of the interaction network in a general class of Ising models, Phys. Rev. Lett. 112, 070603 (2014).
- S. Franz, F. Ricci-Tersenghi, and J. Rocchi, A fast and accurate algorithm for inferring sparse Ising models via parameters activation to maximize the pseudo-likelihood, arXiv:1901.11325.
- R. Foygel and M. Drton, Extended Bayesian information criteria for Gaussian graphical models, in Advances in Neural Information Processing Systems, Vol. 23 (Curran Associates, Inc., Red Hook, NY, 2010).
- H. Zou, The adaptive Lasso and its Oracle properties, J. Am. Stat. Assoc. 101, 1418 (2006).
- N. Meinshausen, Relaxed Lasso, Computational Statistics and Data Analysis 52, 374 (2007).
- A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci. 2, 183 (2009).
- C.-H. Zhang, Nearly unbiased variable selection under minimax concave penalty, Ann. Stat. 38, 894 (2010).
- J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its Oracle properties, J. Am. Stat. Assoc. 96, 1348 (2001).
- H. Wang, Bayesian graphical Lasso models and efficient posterior computation, Bayesian Anal. 7, 867 (2012).
- A. Mohammadi and E. C. Wit, Bayesian structure learning in sparse Gaussian graphical models, Bayesian Anal. 10, 109 (2015).
- L. Vogels, R. Mohammadi, M. Schoonhoven, and S. I. Birbil, Bayesian structure learning in undirected Gaussian graphical models: Literature review with empirical comparison, arXiv:2307.02603.
- Y. Ni, V. Baladandayuthapani, M. Vannucci, and F. C. Stingo, Bayesian graphical models for modern biological applications, Stat. Methods Appl. 31, 197 (2022).
- A. Dobra, A. Lenkoski, and A. Rodriguez, Bayesian inference for general Gaussian graphical models with application to multivariate lattice data, J. Am. Stat. Assoc. 106, 1418 (2011).
- H. Wang, Scaling it up: Stochastic search structure learning in graphical models, Bayesian Anal. 10, 351 (2015).
- Z. Richard Li, T. H. McCormick, and S. J. Clark, Bayesian joint spike-and-slab graphical Lasso, Proc. Mach. Learn. Res. 97, 3877 (2019).
- C. M. Carvalho, N. G. Polson, and J. G. Scott, Handling sparsity via the horseshoe, in Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (PMLR, 2009), pp. 73–80.
- Y. Li, B. A. Craig, and A. Bhadra, The graphical horseshoe estimator for inverse covariance matrices, J. Comput. Graph. Stat. 28, 747 (2019).
- Z. R. Li, T. H. McComick, and S. J. Clark, Using Bayesian latent Gaussian graphical models to infer symptom associations in verbal autopsies, Bayesian Anal. 15, 781 (2020).
- J. Rissanen, Information and Complexity in Statistical Modeling, 1st ed. (Springer, New York, 2010).
- P. D. Grünwald, The Minimum Description Length Principle (The MIT Press, Cambridge, MA, 2007).
- T. P. Peixoto, Scalable network reconstruction in subquadratic time, arXiv:2401.01404.
- A. N. Tikhonov, On the stability of inverse problems, Proc. USSR Acad. Sci. 39, 195 (1943).
- W. W. Zachary, An information flow model for conflict and fission in small groups, J. Anthropol. Res. 33, 452 (1977).
Erroneously interpreting the log of a probability density
as information could not only lead to meaningless negative values when , but also its value would change arbitrarily under an exchange of variables , with bijective, leading to a new probability density , and a new value arbitrarily smaller or larger than , even though it refers to a fully equivalent model. It is in fact easy to consider other choices for
that eliminate residual shrinkage. A simple one is to sample the weight categories uniformly from the set of real numbers that are representable with (e.g., using the IEEE 754 standard for floating-point arithmetic [60]), leading to . This gives virtually indistinguishable results to the choice of Eq. (25) in most examples we investigated, except, as mentioned in the main text, in situations where the weight categories would tend to diverge with , where we found Eq. (25) to yield a more convenient model as its residual shrinkage prevents this from happening in these edge cases. - IEEE standard for floating-point arithmetic, IEEE Std 754–2008, 1 (2008).
- S. J. Wright, Coordinate descent algorithms, Math. Program. 151, 3 (2015).
- W. Dong, C. Moses, and K. Li, Efficient
-nearest neighbor graph construction for generic similarity measures, in Proceedings of the 20th International Conference on World Wide Web (WWW ’11) (Association for Computing Machinery, New York, 2011), pp. 577–586. Random bisection search will eventually succeed in finding the global minimum of a multimodal objective, hence it should in general be preferred over deterministic bisection, unless the objective function is convex.
- T. P. Peixoto, Merge-split Markov chain Monte Carlo for community detection, Phys. Rev. E 102, 012305 (2020).
- T. P. Peixoto, The graph-tool Python library, figshare, 10.6084/m9.figshare.1164194, 2014, available at https://graph-tool.skewed.de.
To sample from the equilibrium Ising model we use MCMC with the Metropolis-Hastings criterion [67]. To determine equilibration we computed the
of parallel chains [68], as well as the effective sample size (ESS). - N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, Equation of state calculations by fast computing machines, J. Chem. Phys. 21, 1087 (1953).
- A. Vehtari, A. Gelman, D. Simpson, B. Carpenter, and P.-C. Bürkner, Rank-normalization, folding, and localization: An improved
for assessing convergence of MCMC (with discussion), Bayesian Anal. 16, 667 (2021). - J. Moody, Peer influence groups: Identifying dense clusters in large networks, Soc. Networks 23, 261 (2001).
- M. Girvan and M. E. J. Newman, Community structure in social and biological networks, Proc. Natl. Acad. Sci. U.S.A. 99, 7821 (2002).
- T. P. Peixoto, Network reconstruction and community detection from dynamics, Phys. Rev. Lett. 123, 128301 (2019).
- B. Karrer and M. E. J. Newman, Stochastic blockmodels and community structure in networks, Phys. Rev. E 83, 016107 (2011).
- T. P. Peixoto, Nonparametric Bayesian inference of the microcanonical stochastic block model, Phys. Rev. E 95, 012317 (2017).
- T. P. Peixoto, Hierarchical block structures and high-resolution model selection in large networks, Phys. Rev. X 4, 011047 (2014).
Both methods also lead to inferences of isolated nodes. These correspond to deputies who are mostly absent from voting sessions, and more typically abstain.
- Z. D. Kurtz, C. L. Müller, E. R. Miraldi, D. R. Littman, M. J. Blaser, and R. A. Bonneau, Sparse and compositionally robust inference of microbial ecological networks, PLoS Comput. Biol. 11, e1004226 (2015).
- N. Connor, A. Barberán, and A. Clauset, Using null models to infer microbial co-occurrence networks, PLoS One 12, e0176751 (2017).
- M. Layeghifard, D. M. Hwang, and D. S. Guttman, Disentangling interactions in the microbiome: A network perspective, Trends Microbiol. 25, 217 (2017).
- J. Lloyd-Price, A. Mahurkar, G. Rahnavard, J. Crabtree, J. Orvis, A. B. Hall, A. Brady, H. H. Creasy, C. McCracken, M. G. Giglio, D. McDonald, E. A. Franzosa, R. Knight, O. White, and C. Huttenhower, Strains, functions and dynamics in the expanded Human Microbiome Project, Nature (London) 550, 61 (2017).
- J. A. Gilbert, F. Meyer, J. Jansson, J. Gordon, N. Pace, J. Tiedje, R. Ley, N. Fierer, D. Field, N. Kyrpides, F.-O. Glöckner, H.-P. Klenk, K. E. Wommack, E. Glass, K. Docherty, R. Gallery, R. Stevens, and R. Knight, The Earth Microbiome Project: Meeting report of the “1st EMP meeting on sample selection and acquisition” at Argonne National Laboratory October 6th 2010, Stand. Genomic Sci. 3, 249 (2010).
- R. T. Paine, A note on trophic complexity and community stability, Am. Nat. 103, 91 (1969).
- L. S. Mills, M. E. Soulé, and D. F. Doak, The keystone-species concept in ecology and conservation, BioScience 43, 219 (1993).
- M. Mezard and A. Montanari, Information, Physics, and Computation (Oxford University Press, New York, 2009).
- V. Dakos, B. Matthews, A. P. Hendry, J. Levine, N. Loeuille, J. Norberg, P. Nosil, M. Scheffer, and L. De Meester, Ecosystem tipping points in an evolving world, Nat. Ecol. Evol. 3, 355 (2019).
- V. Vinciotti, E. Wit, and F. Richter, Random graphical model of microbiome interactions in related environments, J. Agric. Biol. Environ. Stat. 29, 1 (2024).
- T. P. Peixoto, Bayesian stochastic blockmodeling, in Advances in Network Clustering and Blockmodeling (John Wiley & Sons, Ltd, New York, 2019), pp. 289–332.
- J. Besag, Spatial interaction and the statistical analysis of lattice systems, J. R. Stat. Soc. Ser. B 36, 192 (1974).
- A. Mozeika, O. Dikmen, and J. Piili, Consistent inference of a general model using the pseudolikelihood method, Phys. Rev. E 90, 010101(R) (2014).