Award Abstract # 1713082
Collaborative Research: Inference for Network Models with Covariates: Leveraging Local Information for Statistically and Computationally Efficient Estimation of Global Parameters

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: UNIVERSITY OF TEXAS AT AUSTIN
Initial Amendment Date: April 25, 2017
Latest Amendment Date: April 25, 2017
Award Number: 1713082
Award Instrument: Standard Grant
Program Manager: Pena Edsel
DMS
 Division Of Mathematical Sciences
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: July 1, 2017
End Date: June 30, 2021 (Estimated)
Total Intended Award Amount: $160,001.00
Total Awarded Amount to Date: $160,001.00
Funds Obligated to Date: FY 2017 = $160,001.00
History of Investigator:
  • Purnamrita Sarkar (Principal Investigator)
    purna.sarkar@austin.utexas.edu
Recipient Sponsored Research Office: University of Texas at Austin
110 INNER CAMPUS DR
AUSTIN
TX  US  78712-1139
(512)471-6424
Sponsor Congressional District: 25
Primary Place of Performance: The University of Texas at Austin
2317 Speedway, Stop D900
Austin
TX  US  78712-1532
Primary Place of Performance
Congressional District:
25
Unique Entity Identifier (UEI): V6AFQPN18437
Parent UEI:
NSF Program(s): STATISTICS
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 126900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

Large datasets, which are naturally modeled as a network or graph, arise in almost every field of human endeavor. For example, Facebook is a social network, where nodes are users, with edges corresponding to friendships. In gene networks, nodes represent genes with connections corresponding to their co-expression. In ecological networks, the nodes are animal species, with edges determined according to who eats whom. A major focus of research for network or graph data has been on identifying community membership of the nodes. However, what is often more important for scientific purposes is examining the nature and evolution of edge and membership probabilities, for instance changes in gene features of individuals as a function of some unknown factor, like a disease. The focus on using other measured features of nodes and edges could add, in decisive ways, to the information available from observed edges or interactions between nodes. These could be disease symptoms or test results, or demographic information of users in social networks. Statistical inference in such models, despite its importance, has only just begun to be studied. There are both theoretical and computational challenges, due both to the complexity of models fitted, and the size of data sets. The research will lead to the development of algorithms for fitting models and statistical measures of confidence, with potential applications to many fields.

The research is focused on block models for graphs, when node or edge covariates are present. When formulated, these models are no longer block models, but models whose membership probabilities depend upon covariates and whose connection probabilities depend both on block membership and individual covariates. Fitting algorithms involve alternating between fitting block and covariate parameters. Variational (mean field) approaches which effectively lead to semi-parametric model fitting with nK membership "nuisance" parameters, with n representing the number of nodes and K the number of communities, are examined. As these approaches have been found by the PIs to be unstable for large n, the PIs have already begun to investigate the theoretical and practical aspects of divide and conquer algorithms where many subgraphs are independently fit. The PIs will study the statistical properties, both asymptotically and through simulations, and develop practicable and computationally stable methods for large, relatively sparse graphs.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Li, Tianxi and Lei, Lihua and Bhattacharyya, Sharmodeep and Van den Berge, Koen and Sarkar, Purnamrita and Bickel, Peter J. and Levina, Elizaveta "Hierarchical Community Detection by Recursive Partitioning" Journal of the American Statistical Association , 2020 https://doi.org/10.1080/01621459.2020.1833888 Citation Details
Mukherjee, S. and Sarkar, P. and Lin, L "On clustering network-valued data" Advances in neural information processing systems , 2017 Citation Details
Mukherjee, Soumendu Sundar and Sarkar, Purnamrita and Bickel, Peter J. "Two provably consistent divide-and-conquer clustering algorithms for large networks" Proceedings of the National Academy of Sciences , v.118 , 2021 https://doi.org/10.1073/pnas.2100482118 Citation Details
Purnamrita Sarkar, Y. X. "When random initializations help: a study of variational inference for community detection" Journal of machine learning research , 2021 Citation Details
Soumendu Sundar Mukherjee, Purnamrita Sarkar "Mean Field for the Stochastic Blockmodel: Optimization Landscape and Convergence Issues" Advances in neural information processing systems , 2018 Citation Details
Xueyu Mao, Purnamrita Sarkar "Overlapping Clustering Models, and One (class) SVM to Bind Them All" Advances in neural information processing systems , 2018 Citation Details
Yan, B. and Sarkar, P. and Cheng, X. "Provable Estimation of the Number of Blocks in Block Models" Proceedings of the International Conference on Artificial Intelligence and Statistics , 2018 Citation Details
Yan, B. and Yin, M. and Sarkar, P. "Convergence of Gradient EM on Multi-component Mixture of Gaussians" Advances in neural information processing systems , 2017 Citation Details
Yan, Bowei and Sarkar, Purnamrita "Covariate Regularized Community Detection in Sparse Graphs" Journal of the American Statistical Association , v.116 , 2021 https://doi.org/10.1080/01621459.2019.1706541 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This award resulted into several publications in the area of statistical inference of networks, and in general clustering problems. In the area of clustering and community detection, this work ranged from designing provable divide and conquer algorithms for community detection in large networks (PNAS 2021), provable algorithms for tuning hyperparameters for general clustering problems (ICML 2020), a spotlight paper on the general framework for inference in mixed membership type clustering models (NeurIPS 2018), a hierarchical community detection algorithm based on recursive partitioning (JASA 2020), and network community detection in the presence of covariates (JASA 2020).  In addition to this, the grant led to work on clustering of network valued data where each datapoint is a network (NeurIPS 2017),  study of the theoretical properties of iterative optimization algorithms for non-convex objective functions like Expectation Maximization for mixtures of Gaussians (NeurIPS 2017), and theoretical examination of Bayesian variational inference for community detection (JMLR 2021, AISTATS 2020, NeurIPS 2018). 

These papers involved several PhD students at UT Austin, four from the Department of Statistics and Data Sciences (two of whom were from underrepresented in STEM fields), and one student from the Computer Science department. Out of these, two students were the PI's graduate students, who have since graduated and one is co-advised by the PI. Based on the research topics developed during the duration of this grant, the PI has given talks in a number of venues, including the Joint Statistical Meetings, the Neyman Seminar at the Statistics department of U. C. Berkeley, the IEEE Data Science Workshop, the Peter Hall conference at U. C. Davis, Networks 2021 (A joint Sunbelt and Netsci conference), the Department of Mathematics, University of Maryland, College Park, etc. 

 


Last Modified: 03/21/2022
Modified by: Purnamrita Sarkar

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page