NSF Award Search: Award # 1660648

Award Abstract # 1660648

Collaborative Proposal: ABI Innovation:A Graph Based Approach for the Genome Wide Prediction of Conditionaly Essential Genes

NSF Org:	DBI Division of Biological Infrastructure
Recipient:	YALE UNIV
Initial Amendment Date:	July 19, 2017
Latest Amendment Date:	July 19, 2017
Award Number:	1660648
Award Instrument:	Standard Grant
Program Manager:	Peter McCartney DBI Division of Biological Infrastructure BIO Directorate for Biological Sciences
Start Date:	September 15, 2017
End Date:	August 31, 2022 (Estimated)
Total Intended Award Amount:	$1,203,514.00
Total Awarded Amount to Date:	$1,203,514.00
Funds Obligated to Date:	FY 2017 = $1,203,514.00
History of Investigator:	Mark Gerstein (Principal Investigator) Mark.Gerstein@yale.edu
Recipient Sponsored Research Office:	Yale University 150 MUNSON ST NEW HAVEN CT US 06511-3572 (203)785-4689
Sponsor Congressional District:	03
Primary Place of Performance:	Yale University 266 Whitney Avenue New Haven CT US 06520-8114
Primary Place of Performance Congressional District:	03
Unique Entity Identifier (UEI):	FL6GV84CKN57
Parent UEI:	FL6GV84CKN57
NSF Program(s):	ADVANCES IN BIO INFORMATICS
Primary Program Source:	01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s):	116500
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.074

ABSTRACT

How does one identify, and characterize at the genome scale, the set of genes that is essential for an organism to grow and thrive under particular conditions? Predicting such sets of genes is a fundamental goal in bioinformatics; this project aims to create methods and tools for making accurate lists of such functional genes. The approach combines phenotype prediction with knowledge about the functional biological networks in cells to infer new knowledge. The network analysis methods developed here can be easily transferred and applied to a large variety of datasets to answer a wide range of questions from inferring gene-phenotype associations to detecting communities on social networks, extensions highly relevant to the network science community. Moreover, the project's state-of-the-art analysis of temporal gene expression data using state-space models and dimensionality reduction techniques is universally applicable to any groups of genes - e.g. tissue specific vs universally expressed genes. In addition to advancing functional genomics knowledge in the study organism, yeast, the tools will have an impact on research in fields like personal genomics research, by providing a large-scale system-level identification and molecular characterization of phenotypes. Finally, this project provides new and innovative tools for education in bioinformatics.

In more technical terms, this project's major goal is to develop new mathematical models and methods that, given a set of genes or an entire genome, can infer their phenotypes and suggest whether or not these genes are necessary for the organism survival. Specifically, information will be integrated on two levels: phenotypic and molecular. At the phenotypic level the structure of biological networks will be used to assign phenotypic attributes to genes and identify sets of genes that share similar essential phenotypes. At the molecular level, the resulted phenotype predictions will be refined by identifying groups of essential genes governed by similar activity patterns. The integration of the information on these two levels will result in a comprehensive gene-phenotype characterization and a refined group of conditionally essential genes. The resulting predictions will be validated experimentally in two yeast systems. All the tools and datasets associated with this project will be made freely available through genopheno.gersteinlab.org.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Galeano, Diego and Li, Shantao and Gerstein, Mark and Paccanaro, Alberto "Predicting the frequencies of drug side effects" Nature Communications , v.11 , 2020 https://doi.org/10.1038/s41467-020-18305-y Citation Details

Li, Bian and Yang, Yucheng T. and Capra, John A. and Gerstein, Mark B. "Predicting changes in protein thermodynamic stability upon point mutation with deep 3D convolutional neural networks" PLOS Computational Biology , v.16 , 2020 https://doi.org/10.1371/journal.pcbi.1008291 Citation Details

Mohsen, Hussein and Gunasekharan, Vignesh and Qing, Tao and Seay, Montrell and Surovtseva, Yulia and Negahban, Sahand and Szallasi, Zoltan and Pusztai, Lajos and Gerstein, Mark B. "Network propagation-based prioritization of long tail genes in 17 cancer types" Genome Biology , v.22 , 2021 https://doi.org/10.1186/s13059-021-02504-x Citation Details

Spakowicz, Daniel and Lou, Shaoke and Barron, Brian and Gomez, Jose L. and Li, Tianxiao and Liu, Qing and Grant, Nicole and Yan, Xiting and Hoyd, Rebecca and Weinstock, George and Chupp, Geoffrey L. and Gerstein, Mark "Approaches for integrating heterogeneous RNA-seq data reveal cross-talk between microbes and genes in asthmatic patients" Genome Biology , v.21 , 2020 https://doi.org/10.1186/s13059-020-02033-z Citation Details

Warrell, Jonathan and Gerstein, Mark "Cyclic and multilevel causation in evolutionary processes" Biology & Philosophy , v.35 , 2020 https://doi.org/10.1007/s10539-020-09753-3 Citation Details

Yan, Koon-Kiu and Wang, Daifeng and Xiong, Kun and Gerstein, Mark "Comparing Technological Development and Biological Evolution from a Network Perspective" Cell Systems , v.10 , 2020 https://doi.org/10.1016/j.cels.2020.02.004 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This award has led to several key publications on prioritizing gene importance, understanding the impact of mutations, and analyzing gene networks. It also led to the development of practical machine-learning methods.

(1) We developed a general theoretical framework that uses extended Pearls do-calculus to incorporate cyclic causal interactions and multilevel causation. To analyze causal information dynamics in our framework, we also developed information-theoretic notions necessary to introduce a causal generalization of the Partial Information Decomposition framework. Our causal framework helps to clarify conceptual issues in the context of complex trait analysis and assign variation in an observed trait to genetic, epigenetic, and environmental factors, including mutations. This work has been published in Biology & Philosophy (2020).

(2) We developed a machine learning model to predict the biological effects of genetic variants using deep 3D convolutional neural networks. This work has been published in PLoS Computational biology (2020).

(3) In addition to predicting the impact of genetic variants in the human genome, we developed a pipeline that integrates dimensionality reduction and Latent Dirichlet Allocation (LDA) with single-cell RNA-seq data to predict potential interactions between microbes and human genes in patients. This work has been published in Genome biology (2020).

(4) We presented a machine learning framework to learn the latent signatures of small molecules and their deleterious effects. We demonstrated that our model is informative in relating the molecule signatures to distinct anatomical categories. This work has been published in Nature communications (2020).

(5) Finally, at the network level, we compared the "patterns of mutation" in biological and technological networks, and based on network propagation methods, we developed computational models that can predict genes associated with a disease. Integrating network information increases the statistical power of our models and thus can identify many previously overlooked causal genes. These works have been published in Cell systems (2020) and Genome biology (2021).

Last Modified: 11/27/2022
Modified by: Mark B Gerstein

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error