This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.

Journal of Statistical Mechanics: Theory and Experiment

  • sissa.gif
    Close

    The International School for Advanced Studies (SISSA) was founded in 1978 and was the first institution in Italy to promote post-graduate courses leading to a Doctor Philosophiae (or PhD) degree. A centre of excellence among Italian and international universities, the school has around 65 teachers, 100 post docs and 245 PhD students, and is located in Trieste, in a campus of more than 10 hectares with wonderful views over the Gulf of Trieste.

    SISSA hosts a very high-ranking, large and multidisciplinary scientific research output. The scientific papers produced by its researchers are published in high impact factor, well-known international journals, and in many cases in the world's most prestigious scientific journals such as Nature and Science. Over 900 students have so far started their careers in the field of mathematics, physics and neuroscience research at SISSA.

    Visit www.sissa.it

    .

On sampling and modeling complex systems

Matteo Marsili1, Iacopo Mastromatteo2 and Yasser Roudi3,4

Show affiliations


Paper

The study of complex systems is limited by the fact that only a few variables are accessible for modeling and sampling, which are not necessarily the most relevant ones to explain the system behavior. In addition, empirical data typically undersample the space of possible states. We study a generic framework where a complex system is seen as a system of many interacting degrees of freedom, which are known only in part, that optimize a given function. We show that the underlying distribution with respect to the known variables has the Boltzmann form, with a temperature that depends on the number of unknown variables. In particular, when the influence of the unknown degrees of freedom on the known variables is not too irregular, the temperature decreases as the number of variables increases. This suggests that models can be predictable only when the number of relevant variables is less than a critical threshold. Concerning sampling, we argue that the information that a sample contains on the behavior of the system is quantified by the entropy of the frequency with which different states occur. This allows us to characterize the properties of maximally informative samples: within a simple approximation, the most informative frequency size distributions have power law behavior and Zipf's law emerges at the crossover between the under sampled regime and the regime where the sample contains enough statistics to make inferences on the behavior of the system. These ideas are illustrated in some applications, showing that they can be used to identify relevant variables or to select the most informative representations of data, e.g. in data clustering.


Keywords

statistical inference

clustering techniques

critical phenomena of socio-economic systems

Protein function and design (Theory)

 

E-print Number: 1301.3622

Cited: by |

Refers: to

Dates

Issue 09 (September 2013)

Received 18 April 2013, accepted for publication 3 August 2013

Published 6 September 2013

Permissions

Get permission to re-use this article



  1. On sampling and modeling complex systems

    Matteo Marsili et al J. Stat. Mech. (2013) P09003

View by subject




Export