Award Abstract # 1651995
CAREER: Gaussian Graphical Models: Theory, Computation, and Applications

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: MASSACHUSETTS INSTITUTE OF TECHNOLOGY
Initial Amendment Date: February 23, 2017
Latest Amendment Date: June 21, 2021
Award Number: 1651995
Award Instrument: Continuing Grant
Program Manager: Yong Zeng
yzeng@nsf.gov
 (703)292-7299
DMS
 Division Of Mathematical Sciences
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: July 1, 2017
End Date: June 30, 2023 (Estimated)
Total Intended Award Amount: $400,000.00
Total Awarded Amount to Date: $400,000.00
Funds Obligated to Date: FY 2017 = $75,459.00
FY 2018 = $77,558.00

FY 2019 = $80,206.00

FY 2020 = $81,982.00

FY 2021 = $84,795.00
History of Investigator:
  • Caroline Uhler (Principal Investigator)
    cuhler@mit.edu
Recipient Sponsored Research Office: Massachusetts Institute of Technology
77 MASSACHUSETTS AVE
CAMBRIDGE
MA  US  02139-4301
(617)253-1000
Sponsor Congressional District: 07
Primary Place of Performance: Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge
MA  US  02139-4301
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): E2NYLCDML6V1
Parent UEI: E2NYLCDML6V1
NSF Program(s): STATISTICS,
Division Co-Funding: CAREER
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
01001819DB NSF RESEARCH & RELATED ACTIVIT

01001920DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045
Program Element Code(s): 126900, 804800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

Technological advances and the information era allow the collection of massive amounts of data at unprecedented resolution. Making use of this data to gain insight into complex phenomena requires characterizing the relationships among a large number of variables. Graphical models explicitly capture the statistical relationships between the variables of interest in the form of a network. Such a representation, in addition to enhancing interpretability of the model, enables computationally efficient inference. The investigator develops methodology to infer undirected and directed networks between a large number of variables from observational data. This research has broad societal impact, as it affects application domains from weather forecasting to phylogenetics and to personalized medicine. In addition, the PI is one of the initial faculty hires in a new MIT-wide effort in statistics. As such, the PI has major impact on creating new undergraduate and PhD programs in statistics to train the next generation in big data analytics, crucial for taking on challenging roles in this data-rich world.

The goal of this project is to study probabilistic graphical models using an integrated approach that combines ideas from applied algebraic geometry, convex optimization, mathematical statistics, and machine learning, and to apply these models to scientifically important novel problems. The research agenda is structured into three projects. In the first project, the investigator develops methods to infer causal relationships between variables from observational data using the framework of directed Gaussian graphical models combined with tools from optimization and algebraic geometry. The end goal is to apply this new methodology to learn tissue- and person-specific gene regulatory networks from gene expression data such as the Genotype-Tissue Expression (GTEx) project. In the second project, the investigator develops scalable methods for maximum likelihood estimation in Gaussian models with linear constraints on the covariance matrix or its inverse. Such models are important for inference of phylogenetic trees or cellular differentiation trees. The third project is an application of graphical models to weather forecasting; the investigator develops new parametric methods based on Gaussian copulas and also non-parametric methods for the post-processing of numerical weather prediction models that take into account the complicated dependence structure of weather variables in space and time.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 57)
Agrawal, Raj and Roy, Uma and Uhler, Caroline "Covariance Matrix Estimation under Total Positivity for Portfolio Selection*" Journal of Financial Econometrics , 2020 https://doi.org/10.1093/jjfinec/nbaa018 Citation Details
Agrawal, Raj and Squires, Chandler and Prasad, Neha and Uhler, Caroline "The DeCAMFounder: nonlinear causal discovery in the presence of hidden variables" Journal of the Royal Statistical Society Series B: Statistical Methodology , v.85 , 2023 https://doi.org/10.1093/jrsssb/qkad071 Citation Details
Agrawal, R. and Broderick, T. and Uhler, C. "Minimal I-MAP MCMC for scalable structure discovery in causal DAG models" Proceedings of Machine Learning Research , v.80 , 2018 Citation Details
Agrawal, R. and Squires, C. and Yang, K.D. and Shanmugam, K. and Uhler, C. "ABCD-Strategy: Budgeted experimental design for targeted causal structure discovery" Proceedings of Machine Learning Research , v.89 , 2019 Citation Details
Belyaeva, Anastasiya and Cammarata, Louis and Radhakrishnan, Adityanarayanan and Squires, Chandler and Yang, Karren Dai and Shivashankar, G. V. and Uhler, Caroline "Causal network models of SARS-CoV-2 expression and aging to identify candidates for drug repurposing" Nature Communications , v.12 , 2021 https://doi.org/10.1038/s41467-021-21056-z Citation Details
Belyaeva, Anastasiya and Kubjas, Kaie and Sun, Lawrence J. and Uhler, Caroline "Identifying 3D Genome Organization in Diploid Organisms via Euclidean Distance Geometry" SIAM Journal on Mathematics of Data Science , v.4 , 2022 https://doi.org/10.1137/21M1390372 Citation Details
Belyaeva, Anastasiya and Squires, Chandler and Uhler, Caroline "DCI: learning causal differences between gene regulatory networks" Bioinformatics , 2021 https://doi.org/10.1093/bioinformatics/btab167 Citation Details
Belyaeva, Anastasiya and Venkatachalapathy, Saradha and Nagarajan, Mallika and Shivashankar, G. V. and Uhler, Caroline "Network analysis identifies chromosome intermingling regions as regulatory hotspots for transcription" Proceedings of the National Academy of Sciences , v.114 , 2017 10.1073/pnas.1708028115 Citation Details
Bernstein, D. I. and Saeed, B. and Squires, C. and Uhler, C. "Ordering-based causal structure learning in the presence of latent variables" Proceedings of Machine Learning Research , 2020 Citation Details
Casanellas, Marta and Petrovi, Sonja and Uhler, Caroline "Algebraic Statistics in Practice: Applications to Networks" Annual Review of Statistics and Its Application , v.7 , 2020 10.1146/annurev-statistics-031017-100053 Citation Details
Katz-Rogozhnikov, D. and Shanmugam, K. and Squires, C. and Uhler, C. "Size of interventional Markov equivalence classes in random DAG models" Proceedings of Machine Learning Research , v.89 , 2019 Citation Details
(Showing: 1 - 10 of 57)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

A central problem in biology and for biomedical discovery is the inference of gene regulatory networks, causal networks that allow predicting the effect of any intervention in the system. Key outcomes of this project were the development of theory, algorithms, and computational methods for: 1) causal structure discovery, i.e., learning the underlying cause-and-effect relationships such as which gene up- or down-regulates which other gene, from a mix of observational and interventional data; 2) optimal experimental design of interventions, a key problem given that the space of possible perturbations that can be performed in biology (e.g. all combinations of genetic perturbations, any combination of drugs) is huge and cannot be fully explored experimentally; 3) predicting the effect of untested intervention-context pairs, such as a drug in a new disease context.

With respect to causal structure discovery, we developed methods that could deal with key issues that make this problem challenging in practice, including: the presence of latent (unmeasured) confounders, off-target intervention effects (knock-outs may target also other genes with similar sequences), measurement error in the data collection process (single-cell RNA-seq data is highly zero-inflated), as well as the data coming from unknown disease subtypes and hence data coming from a mixture of causal models. In particular, we developed methods for causal structure discovery that are provably consistent under strictly weaker assumptions than previous algorithms and scale to the large graph sizes needed for applications to gene regulation.

With respect to experimental design, we developed methods for identifying interventions that are optimal in different ways, including: with respect to the amount of information they carry about the underlying causal graph (e.g. the gene regulatory network), as well as with respect to moving the distribution from any given state to a desired state via interventions.

With respect to predicting the effect of untested interventions from a set of tested interventions, we viewed this causal transportability problem as a tensor completion problem and developed novel algorithms based on infinitely wide neural networks that are fast, flexible and effective for the problem of causal transportability. We also benchmarked these algorithms on the problem of virtual drug screening.

Throughout the project, the principal investigator (PI) has actively engaged in activities related to education and research by building a diverse research group attracting talents from underrepresented groups and women, and training these graduate students, undergraduate students, and postdoctoral fellows at the intersection of statistics, machine learning, and the biomedical sciences. The training spanned theory, method development, and applications to important biological and medical problems. To help build the research area and community at large, the PI initiated and organized various conferences, including introducing a new machine learning conference focused solely on causal inference "Causal Learning and Reasoning (CLeaR)" as well as co-organizing the semester-long program on causality at the Simons Institute at UC Berkeley.

The research resulting from this project has been widely disseminated: Several open-source software packages have been developed; in particular, all causal inference algorithms resulting from this project are implemented and freely available in the group's causaldag python library which can be found in the group's github repository (https://github.com/uhlerlab). In addition, the PI disseminated the research results through keynote presentations at various conferences, conference presentations by the involved PhD students, as well as many publications, which were made freely available already pre-publication on the arXiv or bioRxiv preprint repositories to enable accelerated scientific discovery. Finally, material developed as part of this project has been integrated into two courses that the PI teaches at MIT.


Last Modified: 09/28/2023
Modified by: Caroline Uhler

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page