Social influence cannot be identified from purely observational data on social networks, because such influence is generically confounded with latent homophily, i.e., with a node’s network partners being informative about the node’s attributes and therefore its behavior. If the network grows according to either a latent community (stochastic block) model, or a continuous latent space model, then latent homophilous attributes can be consistently estimated from the global pattern of social ties. We show that, for common versions of those two network models, these estimates are so informative that controlling for estimated attributes allows for asymptotically unbiased and consistent estimation of social-influence effects in linear models. In particular, the bias shrinks at a rate that directly reflects how much information the network provides about the latent attributes. These are the first results on the consistent non-experimental estimation of social-influence effects in the presence of latent homophily, and we discuss the prospects for generalizing them.
Disclaimer
As a service to authors and researchers we are providing this version of an accepted manuscript (AM). Copyediting, typesetting, and review of the resulting proofs will be undertaken on this manuscript before final publication of the Version of Record (VoR). During production and pre-press, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal relate to these versions also.Acknowledgments
We thank Andrew C. Thomas, David S. Choi, and Veronica Marotta for many valuable discussions on these and related ideas over the years. We thank Dena Asta and Hannah Worrall, for sharing Asta [2015] and Worrall [2014], respectively; Chao Gao, Zongming Ma, Anderson Y. Zhang, and Harrison H. Zhou for sharing code related to Gao et al. [2017]; Oleg Sofrygin for assistance with simulations using Sofrygin et al. [2017]; and Max Kaplan for related programming assistance. CRS was supported during this work by grants from the NSF (DMS1207759 and DMS1418124) and the Institute for New Economic Thinking (INO1400020) and EM was supported during this work by a grant from Facebook (Computational Social Science Methodology Research Awards).
Notes
1A turn of phrase gratefully borrowed from Ben Hansen.
2We do not mean to take sides in the dispute between the partisans of graphical causal models and those of the potential-outcomes formalism. The expressive power of the latter is strictly weaker than that of suitably-augmented graphical models [Richardson and Robins, 2013], but we could write everything here in terms of potential outcomes, albeit at some cost in space and notation.
3Latent space modeling of dynamic networks is still in its infancy. For some preliminary efforts, see, e.g., DuBois et al. [2013], Ghasemian et al. [2015] for block models, and Sarkar and Moore [2006] for continuous-space models.
4Some of the theory we rely on below allows the number of communities to grow with the size of the network, though with at a rate posited to be known a priori, and not too fast. We leave dealing with this complication to future work.
5An isometry is a transformation of a metric space which preserves distances between points. These transformations naturally form groups, and the properties of these groups control, or encode, the geometry of the metric space [Brannan et al., 1999].
6The GMZZ conditions also include a parameter to control the differences across communities of the within- and between-community connection probabilities. We omit this parameter as the within- and between-community connection probabilities are both constant across communities in our simulations. This restricted parameter space is discussed in Gao et al. [2017] as .