
NSF Org: |
DBI Division of Biological Infrastructure |
Recipient: |
|
Initial Amendment Date: | July 19, 2017 |
Latest Amendment Date: | July 19, 2017 |
Award Number: | 1660648 |
Award Instrument: | Standard Grant |
Program Manager: |
Peter McCartney
DBI Division of Biological Infrastructure BIO Directorate for Biological Sciences |
Start Date: | September 15, 2017 |
End Date: | August 31, 2022 (Estimated) |
Total Intended Award Amount: | $1,203,514.00 |
Total Awarded Amount to Date: | $1,203,514.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
150 MUNSON ST NEW HAVEN CT US 06511-3572 (203)785-4689 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
266 Whitney Avenue New Haven CT US 06520-8114 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | ADVANCES IN BIO INFORMATICS |
Primary Program Source: |
|
Program Reference Code(s): | |
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.074 |
ABSTRACT
How does one identify, and characterize at the genome scale, the set of genes that is essential for an organism to grow and thrive under particular conditions? Predicting such sets of genes is a fundamental goal in bioinformatics; this project aims to create methods and tools for making accurate lists of such functional genes. The approach combines phenotype prediction with knowledge about the functional biological networks in cells to infer new knowledge. The network analysis methods developed here can be easily transferred and applied to a large variety of datasets to answer a wide range of questions from inferring gene-phenotype associations to detecting communities on social networks, extensions highly relevant to the network science community. Moreover, the project's state-of-the-art analysis of temporal gene expression data using state-space models and dimensionality reduction techniques is universally applicable to any groups of genes - e.g. tissue specific vs universally expressed genes. In addition to advancing functional genomics knowledge in the study organism, yeast, the tools will have an impact on research in fields like personal genomics research, by providing a large-scale system-level identification and molecular characterization of phenotypes. Finally, this project provides new and innovative tools for education in bioinformatics.
In more technical terms, this project's major goal is to develop new mathematical models and methods that, given a set of genes or an entire genome, can infer their phenotypes and suggest whether or not these genes are necessary for the organism survival. Specifically, information will be integrated on two levels: phenotypic and molecular. At the phenotypic level the structure of biological networks will be used to assign phenotypic attributes to genes and identify sets of genes that share similar essential phenotypes. At the molecular level, the resulted phenotype predictions will be refined by identifying groups of essential genes governed by similar activity patterns. The integration of the information on these two levels will result in a comprehensive gene-phenotype characterization and a refined group of conditionally essential genes. The resulting predictions will be validated experimentally in two yeast systems. All the tools and datasets associated with this project will be made freely available through genopheno.gersteinlab.org.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This award has led to several key publications on prioritizing gene importance, understanding the impact of mutations, and analyzing gene networks. It also led to the development of practical machine-learning methods.
(1) We developed a general theoretical framework that uses extended Pearls do-calculus to incorporate cyclic causal interactions and multilevel causation. To analyze causal information dynamics in our framework, we also developed information-theoretic notions necessary to introduce a causal generalization of the Partial Information Decomposition framework. Our causal framework helps to clarify conceptual issues in the context of complex trait analysis and assign variation in an observed trait to genetic, epigenetic, and environmental factors, including mutations. This work has been published in Biology & Philosophy (2020).
(2) We developed a machine learning model to predict the biological effects of genetic variants using deep 3D convolutional neural networks. This work has been published in PLoS Computational biology (2020).
(3) In addition to predicting the impact of genetic variants in the human genome, we developed a pipeline that integrates dimensionality reduction and Latent Dirichlet Allocation (LDA) with single-cell RNA-seq data to predict potential interactions between microbes and human genes in patients. This work has been published in Genome biology (2020).
(4) We presented a machine learning framework to learn the latent signatures of small molecules and their deleterious effects. We demonstrated that our model is informative in relating the molecule signatures to distinct anatomical categories. This work has been published in Nature communications (2020).
(5) Finally, at the network level, we compared the "patterns of mutation" in biological and technological networks, and based on network propagation methods, we developed computational models that can predict genes associated with a disease. Integrating network information increases the statistical power of our models and thus can identify many previously overlooked causal genes. These works have been published in Cell systems (2020) and Genome biology (2021).
Last Modified: 11/27/2022
Modified by: Mark B Gerstein
Please report errors in award information by writing to: awardsearch@nsf.gov.