
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | June 4, 2018 |
Latest Amendment Date: | June 4, 2018 |
Award Number: | 1755836 |
Award Instrument: | Standard Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | June 15, 2018 |
End Date: | May 31, 2022 (Estimated) |
Total Intended Award Amount: | $174,831.00 |
Total Awarded Amount to Date: | $174,831.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
107 S INDIANA AVE BLOOMINGTON IN US 47405-7000 (317)278-3473 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
980 Indiana Ave Lockefield 2232 Indianapolis IN US 46202-2915 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Smart and Connected Health |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Recent advances in high throughput technologies have led to a substantial increase in multi-omic data characterizing various levels of molecular changes in the progression of disease, including genome, transcriptome, proteome and metabolome. The availability of computational methods that are sufficiently powerful to handle the high dimensionality and heterogeneity of multi-omic data is still very limited. In addition, major findings generated from current -omics studies have been largely restricted to relatively simple patterns, e.g., individual biomarkers, possibly with few functional interactions, which present difficulties for validating these findings and relating them to downstream biology. This project, by coupling the multi-omic data and the systems biology networks, will develop novel computational methods to explore the functional network modules associated with disease quantitative traits. By enabling both strategic and efficient knowledge extraction from the vast biological landscape represented by multi-omic data, this research has may lead to unprecedented discovery of disease mechanisms and suggest surrogate biomarkers for therapeutic trials.
This work will develop new computational methods to enable the integration of large scale heterogeneous multi-omic data with rich domain knowledge for better biomarker and association discovery. Two interrelated tasks will be performed: 1) Develop a novel biological knowledge guided structured sparse learning model together with large-scale optimization methods to integrate -omic data and biological networks from multiple sources and discover -omic modules involving heterogeneous biomarkers for accurately predicting outcomes of interest; and 2) Couple multi-task learning with structured sparse association models to jointly learn the bi-multivariate associations between imaging phenotypes and -omic features with dense functional connections for multiple groups. The project will contribute to a new solution framework spanning the areas of machine learning, data mining and network science, and also provide novel perspectives as to how to effectively integrate the large-scale and heterogeneous -omic data for a systems biology of complex diseases.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Our research findings are highly relevant to the current national goal to fully utilize the high throughput multi-omics data to improve the public health. The investigation of this project produces several important outcomes.
1. We developed a modularity-constrained Lasso model to jointly analyze the genotype, gene expression and protein expression data for discovery of functionally connected multi-omic disease biomarkers. With a prior network capturing the functional relationship between SNPs, genes and proteins, the newly introduced penalty term maximizes the global modularity of the subnetwork involving selected markers and encourages the selection of multi-omic markers with dense functional connectivity, instead of individual markers. When applied to the ROS/MAP cohort data, a functionally connected subnetwork involving 276 SNPs, genes and proteins, were identified to bear predictive power. Within this subnetwork, multiple trans-omic paths from SNPs to genes and then proteins were observed. This suggests that cognitive performance deterioration in AD patients can be potentially a result of genetic variations due to their cascade effect on the downstream transcriptome and proteome level.
2. We developed a modularity-constrained logistic regression model to mine the association between disease status and a group of functionally connected SNPs, genes and proteins. This new method helped identify a group of densely connected SNPs, genes and proteins predictive of Alzheimer's disease status. These SNPs are mostly eQTLs in the frontal region, where the expression data was collected. These genes and proteins were also found to be associated with various phenotypes of frontal regions. Taken together, these results suggested a potential pathway underlying the development of Alzheimer disease from SNPs to gene expression, protein expression and ultimately brain functional and structural changes.
3. We developed a joint graphical lasso model to investigate the differential gene co-expression patterns during disease stage progression. Assuming that transcriptional coupling is continuously disrupted during disease progression, we modified existing joint graphical lasso to model the similarity only between consecutive disease stages. Our results showed that the joint graphical lasso estimated the gene co-expression patterns with much lower false positive rate, even without the multiple test correction which usually requires a substantial number of permutation tests. We identified
We published 6 full-length papers related to this project in peer-reviewed conference proceedings and journals. This project provided the research topics for one Ph.D. student and 4 master students at Indiana University Purdue University Indianapolis. Three master students have graduated with jobs as bioinformatician at top universities and companies. The PhD student is graduating soon and looking for a post-doc position. The research materials produced in this project laid the foundation for the annual summer workshop targeting high school students. It also provided teaching materials for several graduate courses at Indiana University Purdue University Indianapolis.
Last Modified: 08/05/2022
Modified by: Jingwen Yan
Please report errors in award information by writing to: awardsearch@nsf.gov.