• Institution: UNIVERSITY OF BATH

Accurate and scalable social recommendation using mixed-membership stochastic block models

  1. Marta Sales-Pardoa
  1. aDepartament d’Enginyeria Química, Universitat Rovira i Virgili, 43007 Tarragona, Catalonia, Spain;
  2. bInstitució Catalana de Recerca i Estudis Avançats, 08010 Barcelona, Catalonia, Spain;
  3. cSanta Fe Institute, Santa Fe, NM 87501
  1. Edited by Edoardo Airoldi, Harvard University, and accepted by Editorial Board Member Adrian E. Raftery October 18, 2016 (received for review April 21, 2016)

Significance

Recommendation systems are designed to predict users’ preferences and provide them with recommendations for items such as books or movies that suit their needs. Recent developments show that some probabilistic models for user preferences yield better predictions than latent feature models such as matrix factorization. However, it has not been possible to use them in real-world datasets because they are not computationally efficient. We have developed a rigorous probabilistic model that outperforms leading approaches for recommendation and whose parameters can be fitted efficiently with an algorithm whose running time scales linearly with the size of the dataset. This model and inference algorithm open the door to more approaches to recommendation and to other problems where matrix factorization is currently used.

Abstract

With increasing amounts of information available, modeling and predicting user preferences—for books or articles, for example—are becoming more important. We present a collaborative filtering model, with an associated scalable algorithm, that makes accurate predictions of users’ ratings. Like previous approaches, we assume that there are groups of users and of items and that the rating a user gives an item is determined by their respective group memberships. However, we allow each user and each item to belong simultaneously to mixtures of different groups and, unlike many popular approaches such as matrix factorization, we do not assume that users in each group prefer a single group of items. In particular, we do not assume that ratings depend linearly on a measure of similarity, but allow probability distributions of ratings to depend freely on the user’s and item’s groups. The resulting overlapping groups and predicted ratings can be inferred with an expectation-maximization algorithm whose running time scales linearly with the number of observed ratings. Our approach enables us to predict user preferences in large datasets and is considerably more accurate than the current algorithms for such large datasets.

Footnotes

  • 1To whom correspondence should be addressed. Email: antonia.godoy{at}urv.cat.
  • Author contributions: A.G.-L., R.G., C.M., and M.S.-P. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper.

  • The authors declare no conflict of interest.

  • This article is a PNAS Direct Submission. E.A. is a Guest Editor invited by the Editorial Board.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1606316113/-/DCSupplemental.

Freely available online through the PNAS open access option.

Online Impact