
NSF Org: |
DMS Division Of Mathematical Sciences |
Recipient: |
|
Initial Amendment Date: | April 25, 2017 |
Latest Amendment Date: | April 25, 2017 |
Award Number: | 1713082 |
Award Instrument: | Standard Grant |
Program Manager: |
Pena Edsel
DMS Division Of Mathematical Sciences MPS Directorate for Mathematical and Physical Sciences |
Start Date: | July 1, 2017 |
End Date: | June 30, 2021 (Estimated) |
Total Intended Award Amount: | $160,001.00 |
Total Awarded Amount to Date: | $160,001.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
110 INNER CAMPUS DR AUSTIN TX US 78712-1139 (512)471-6424 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
2317 Speedway, Stop D900 Austin TX US 78712-1532 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | STATISTICS |
Primary Program Source: |
|
Program Reference Code(s): | |
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.049 |
ABSTRACT
Large datasets, which are naturally modeled as a network or graph, arise in almost every field of human endeavor. For example, Facebook is a social network, where nodes are users, with edges corresponding to friendships. In gene networks, nodes represent genes with connections corresponding to their co-expression. In ecological networks, the nodes are animal species, with edges determined according to who eats whom. A major focus of research for network or graph data has been on identifying community membership of the nodes. However, what is often more important for scientific purposes is examining the nature and evolution of edge and membership probabilities, for instance changes in gene features of individuals as a function of some unknown factor, like a disease. The focus on using other measured features of nodes and edges could add, in decisive ways, to the information available from observed edges or interactions between nodes. These could be disease symptoms or test results, or demographic information of users in social networks. Statistical inference in such models, despite its importance, has only just begun to be studied. There are both theoretical and computational challenges, due both to the complexity of models fitted, and the size of data sets. The research will lead to the development of algorithms for fitting models and statistical measures of confidence, with potential applications to many fields.
The research is focused on block models for graphs, when node or edge covariates are present. When formulated, these models are no longer block models, but models whose membership probabilities depend upon covariates and whose connection probabilities depend both on block membership and individual covariates. Fitting algorithms involve alternating between fitting block and covariate parameters. Variational (mean field) approaches which effectively lead to semi-parametric model fitting with nK membership "nuisance" parameters, with n representing the number of nodes and K the number of communities, are examined. As these approaches have been found by the PIs to be unstable for large n, the PIs have already begun to investigate the theoretical and practical aspects of divide and conquer algorithms where many subgraphs are independently fit. The PIs will study the statistical properties, both asymptotically and through simulations, and develop practicable and computationally stable methods for large, relatively sparse graphs.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This award resulted into several publications in the area of statistical inference of networks, and in general clustering problems. In the area of clustering and community detection, this work ranged from designing provable divide and conquer algorithms for community detection in large networks (PNAS 2021), provable algorithms for tuning hyperparameters for general clustering problems (ICML 2020), a spotlight paper on the general framework for inference in mixed membership type clustering models (NeurIPS 2018), a hierarchical community detection algorithm based on recursive partitioning (JASA 2020), and network community detection in the presence of covariates (JASA 2020). In addition to this, the grant led to work on clustering of network valued data where each datapoint is a network (NeurIPS 2017), study of the theoretical properties of iterative optimization algorithms for non-convex objective functions like Expectation Maximization for mixtures of Gaussians (NeurIPS 2017), and theoretical examination of Bayesian variational inference for community detection (JMLR 2021, AISTATS 2020, NeurIPS 2018).
These papers involved several PhD students at UT Austin, four from the Department of Statistics and Data Sciences (two of whom were from underrepresented in STEM fields), and one student from the Computer Science department. Out of these, two students were the PI's graduate students, who have since graduated and one is co-advised by the PI. Based on the research topics developed during the duration of this grant, the PI has given talks in a number of venues, including the Joint Statistical Meetings, the Neyman Seminar at the Statistics department of U. C. Berkeley, the IEEE Data Science Workshop, the Peter Hall conference at U. C. Davis, Networks 2021 (A joint Sunbelt and Netsci conference), the Department of Mathematics, University of Maryland, College Park, etc.
Last Modified: 03/21/2022
Modified by: Purnamrita Sarkar
Please report errors in award information by writing to: awardsearch@nsf.gov.