Award Abstract # 1755836
CRII:SCH:Computational Methods to Mine Multi-omic Data for Systems Biology of Complex Diseases

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: TRUSTEES OF INDIANA UNIVERSITY
Initial Amendment Date: June 4, 2018
Latest Amendment Date: June 4, 2018
Award Number: 1755836
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: June 15, 2018
End Date: May 31, 2022 (Estimated)
Total Intended Award Amount: $174,831.00
Total Awarded Amount to Date: $174,831.00
Funds Obligated to Date: FY 2018 = $174,831.00
History of Investigator:
  • Jingwen Yan (Principal Investigator)
    jingyan@iupui.edu
Recipient Sponsored Research Office: Indiana University
107 S INDIANA AVE
BLOOMINGTON
IN  US  47405-7000
(317)278-3473
Sponsor Congressional District: 09
Primary Place of Performance: Indiana University-Purdue University
980 Indiana Ave Lockefield 2232
Indianapolis
IN  US  46202-2915
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): YH86RTW2YVJ4
Parent UEI:
NSF Program(s): Smart and Connected Health
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 8018, 8228, 9102
Program Element Code(s): 801800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Recent advances in high throughput technologies have led to a substantial increase in multi-omic data characterizing various levels of molecular changes in the progression of disease, including genome, transcriptome, proteome and metabolome. The availability of computational methods that are sufficiently powerful to handle the high dimensionality and heterogeneity of multi-omic data is still very limited. In addition, major findings generated from current -omics studies have been largely restricted to relatively simple patterns, e.g., individual biomarkers, possibly with few functional interactions, which present difficulties for validating these findings and relating them to downstream biology. This project, by coupling the multi-omic data and the systems biology networks, will develop novel computational methods to explore the functional network modules associated with disease quantitative traits. By enabling both strategic and efficient knowledge extraction from the vast biological landscape represented by multi-omic data, this research has may lead to unprecedented discovery of disease mechanisms and suggest surrogate biomarkers for therapeutic trials.

This work will develop new computational methods to enable the integration of large scale heterogeneous multi-omic data with rich domain knowledge for better biomarker and association discovery. Two interrelated tasks will be performed: 1) Develop a novel biological knowledge guided structured sparse learning model together with large-scale optimization methods to integrate -omic data and biological networks from multiple sources and discover -omic modules involving heterogeneous biomarkers for accurately predicting outcomes of interest; and 2) Couple multi-task learning with structured sparse association models to jointly learn the bi-multivariate associations between imaging phenotypes and -omic features with dense functional connections for multiple groups. The project will contribute to a new solution framework spanning the areas of machine learning, data mining and network science, and also provide novel perspectives as to how to effectively integrate the large-scale and heterogeneous -omic data for a systems biology of complex diseases.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

He, Bing and Gorijala, Priyanka and Xie, Linhui and Cao, Sha and Yan, Jingwen "Gene co-expression changes underlying the functional connectomic alterations in Alzheimers disease" BMC Medical Genomics , v.15 , 2022 https://doi.org/10.1186/s12920-022-01244-6 Citation Details
Upadhyaya, Yurika and Xie, Linhui and Salama, Paul and Cao, Sha and Nho, Kwangsik and Saykin, Andrew J. and Yan, Jingwen and Alzheimers Disease Neuroimaging In, for the "Differential co-expression analysis reveals early stage transcriptomic decoupling in alzheimers disease" BMC Medical Genomics , v.13 , 2020 10.1186/s12920-020-0689-y Citation Details
Upadhyaya, Yurika J. and Xie, Linhui and Salama, Paul and Nho, Kwangsik and Saykin, Andrew and Yan, Jingwen "Disruption of gene co-expression network along the progression of Alzheimer's disease" 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) , 2019 10.1109/BHI.2019.8834551 Citation Details
Varathan, Pradeep and Gorijala, Priyanka and Jacobson, Tanner and Chasioti, Danai and Nho, Kwangsik and Risacher, Shannon L. and Saykin, Andrew J. and Yan, Jingwen "Integrative analysis of eQTL and GWAS summary statistics reveals transcriptomic alteration in Alzheimer brains" BMC Medical Genomics , v.15 , 2022 https://doi.org/10.1186/s12920-022-01245-5 Citation Details
Xie, Linhui and He, Bing and Varathan, Pradeep and Nho, Kwangsik and Risacher, Shannon L and Saykin, Andrew J and Salama, Paul and Yan, Jingwen "Integrative-omics for discovery of network-level disease biomarkers: a case study in Alzheimers disease" Briefings in Bioinformatics , v.22 , 2021 https://doi.org/10.1093/bib/bbab121 Citation Details
Xie, Linhui and Varathan, Pradeep and Nho, Kwangsik and Saykin, Andrew J. and Salama, Paul and Yan, Jingwen "Identification of functionally connected multi-omic biomarkers for Alzheimers disease using modularity-constrained Lasso" PLOS ONE , v.15 , 2020 https://doi.org/10.1371/journal.pone.0234748 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Our research findings are highly relevant to the current national goal to fully utilize the high throughput multi-omics data to improve the public health. The investigation of this project produces several important outcomes.

 1. We developed a modularity-constrained Lasso model to jointly analyze the genotype, gene expression and protein expression data for discovery of functionally connected multi-omic disease biomarkers. With a prior network capturing the functional relationship between SNPs, genes and proteins, the newly introduced penalty term maximizes the global modularity of the subnetwork involving selected markers and encourages the selection of multi-omic markers with dense functional connectivity, instead of individual markers. When applied to the ROS/MAP cohort data, a functionally connected subnetwork involving 276 SNPs, genes and proteins, were identified to bear predictive power. Within this subnetwork, multiple trans-omic paths from SNPs to genes and then proteins were observed. This suggests that cognitive performance deterioration in AD patients can be potentially a result of genetic variations due to their cascade effect on the downstream transcriptome and proteome level.

 2. We developed a modularity-constrained logistic regression model to mine the association between disease status and a group of functionally connected SNPs, genes and proteins. This new method helped identify a group of densely connected SNPs, genes and proteins predictive of Alzheimer's disease status. These SNPs are mostly eQTLs in the frontal region, where the expression data was collected. These genes and proteins were also found to be associated with various phenotypes of frontal regions. Taken together, these results suggested a potential pathway underlying the development of Alzheimer disease from SNPs to gene expression, protein expression and ultimately brain functional and structural changes.

 3. We developed a joint graphical lasso model to investigate the differential gene co-expression patterns during disease stage progression. Assuming that transcriptional coupling is continuously disrupted during disease progression, we modified existing joint graphical lasso to model the similarity only between consecutive disease stages. Our results showed that the joint graphical lasso estimated the gene co-expression patterns with much lower false positive rate, even without the multiple test correction which usually requires a substantial number of permutation tests. We identified

 We published 6 full-length papers related to this project in peer-reviewed conference proceedings and journals. This project provided the research topics for one Ph.D. student and 4 master students at Indiana University Purdue University Indianapolis. Three master students have graduated with jobs as bioinformatician at top universities and companies. The PhD student is graduating soon and looking for a post-doc position. The research materials produced in this project laid the foundation for the annual summer workshop targeting high school students. It also provided teaching materials for several graduate courses at Indiana University Purdue University Indianapolis.

 

 


Last Modified: 08/05/2022
Modified by: Jingwen Yan

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page