Award Abstract # 1945971
EAGER: IIBR Informatics: A reinforced imputation framework for accurate gene expression recovery from single-cell RNA-seq data

NSF Org: DBI
Division of Biological Infrastructure
Recipient: OHIO STATE UNIVERSITY, THE
Initial Amendment Date: February 16, 2021
Latest Amendment Date: May 19, 2021
Award Number: 1945971
Award Instrument: Standard Grant
Program Manager: Jen Weller
DBI
 Division of Biological Infrastructure
BIO
 Directorate for Biological Sciences
Start Date: March 1, 2021
End Date: February 29, 2024 (Estimated)
Total Intended Award Amount: $299,999.00
Total Awarded Amount to Date: $299,999.00
Funds Obligated to Date: FY 2021 = $299,999.00
History of Investigator:
  • Qin Ma (Principal Investigator)
    qin.ma@osumc.edu
Recipient Sponsored Research Office: Ohio State University
1960 KENNY RD
COLUMBUS
OH  US  43210-1016
(614)688-8735
Sponsor Congressional District: 03
Primary Place of Performance: Ohio State University
1960 Kenny Road
Columbus
OH  US  43210-1016
Primary Place of Performance
Congressional District:
03
Unique Entity Identifier (UEI): DLWBSLWAJWR1
Parent UEI: MN4MDDMN8529
NSF Program(s): Innovation: Bioinformatics
Primary Program Source: 01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1165
Program Element Code(s): 164Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

Single-cell RNA-Seq (scRNA-Seq) analyses have revolutionized the methods in which researchers can investigate tissue samples of specific cell types. While single-cell sequencing technologies have provided a new frontier for researchers, they also come with a complex set of problems. One of these problems is related to the quality of gene expression estimates, which are used in numerous downstream analyses from the prediction of the cell types/trajectories to determining differentially expressed genes between cell types or tissues. The low coverage and sequencing inefficiencies can affect up to 90% of gene expression estimates for scRNA-Seq studies, and hence, are challenging to overcome. However, there are two critical problems in the way that current methods attempt to address this problem: (1) inadequate use of bulk data to compensate for low expression genes and (2) under-utilization of iterative procedures to optimize highly-connected steps for imputation of gene expression estimates.

This project will develop a novel computational framework to integrate bulk RNA-seq data into scRNA-seq data modeling and analyses, aiming at accurate gene expression estimates from the sparse scRNA-Seq data, and high quality, reliability, and precision of downstream analyses. The aim is to model particular features of the heterogeneous gene expression patterns among various cell types. Integration of bulk RNA-Seq data through de-convolution will be used to develop heterogeneous compensation distributions and probabilities. Utilization of the gamma distribution to determine empirical distribution for single-cell gene expression estimates will improve the baseline expression in a specific cell type and identify estimates of interest through the high level of noise in sequencing data, which will then be combined with compensation information from bulk RNA-Seq data to correct biases from the high noise scRNA-Seq data. Finally, the updated expression estimate will be used to iterate back through the process to provide improved results for each stage of the process. The outcome will be a novel imputation framework that should enable scRNA-Seq expression estimates through the integration of the above three new characteristics.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Chang, Yuzhou and He, Fei and Wang, Juexin and Chen, Shuo and Li, Jingyi and Liu, Jixin and Yu, Yang and Su, Li and Ma, Anjun and Allen, Carter and Lin, Yu and Sun, Shaoli and Liu, Bingqiang and Javier Otero, José and Chung, Dongjun and Fu, Hongjun and Li "Define and visualize pathological architectures of human tissues from spatially resolved transcriptomics using deep learning" Computational and Structural Biotechnology Journal , v.20 , 2022 https://doi.org/10.1016/j.csbj.2022.08.029 Citation Details
Chen, Junyi and Wang, Xiaoying and Ma, Anjun and Wang, Qi-En and Liu, Bingqiang and Li, Lang and Xu, Dong and Ma, Qin "Deep transfer learning of cancer drug responses by integrating bulk and single-cell RNA-seq data" Nature Communications , v.13 , 2022 https://doi.org/10.1038/s41467-022-34277-7 Citation Details
Ma, Anjun and Wang, Juexin and Xu, Dong and Ma, Qin "Deep learning analysis of singlecell data in empowering clinical implementation" Clinical and Translational Medicine , v.12 , 2022 https://doi.org/10.1002/ctm2.950 Citation Details
Ma, Anjun and Xin, Gang and Ma, Qin "The use of single-cell multi-omics in immuno-oncology" Nature Communications , v.13 , 2022 https://doi.org/10.1038/s41467-022-30549-4 Citation Details
Wang, Juexin and Ma, Anjun and Chang, Yuzhou and Gong, Jianting and Jiang, Yuexu and Qi, Ren and Wang, Cankun and Fu, Hongjun and Ma, Qin and Xu, Dong "scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses" Nature Communications , v.12 , 2021 https://doi.org/10.1038/s41467-021-22197-x Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Project Outcomes Report

 

The overarching goal of the project is to mathematically model transcriptional regulatory signals and their associated co-regulation gene modules through transcriptomic profiles of single cells. Furthermore, it seeks to identify and annotate the gene signatures for each signal and estimate the level of each signal in independent tissue data. Another major goal of this project is to enhance the accuracy, efficiency, and versatility of gene imputation, cell clustering, and cell-cell interaction interpretation by integrating information from both bulk and single-cell transcriptomic data. Throughout the project period, several significant activities were undertaken. Firstly, the development and enhancement of scGNN, a specialized tool for single-cell RNA-Seq analysis, proved pivotal in overcoming challenges related to sequencing sparsity and provided insights into diverse biological systems. This was followed by the introduction of scGNN 2.0, which expanded capabilities for imputation and clustering while integrating advanced visualization features and optimization for faster performance. The primary objective of scGNN 2.0 was to refine gene expression estimates from scRNA-seq analyses using bulk RNA-seq data, achieved through the development of an iterative GNN-based scRNA-Seq imputation model. We also developed scGNN 3.0 with the implementation of a large language model to be fine-tuned for question-and-answer guidelines, automatic code generation and debugging, and interactive result interpretation. Additionally, the developed R package, IRIS-FGM, provided efficient identification of functional gene modules, compatibility with existing pipelines like Seurat, and overcame limitations of previous implementations, thus advancing scRNA-Seq analysis capabilities significantly. Lastly, we delivered scDEAL, a deep transfer learning framework for cancer drug response prediction at the single-cell level by integrating large-scale bulk cell-line data. The highlight in scDEAL involves harmonizing drug-related bulk RNA-seq data with scRNA-seq data and transferring the model trained on bulk RNA-seq data to predict drug responses in scRNA-seq. Also, scDEAL uses integrated gradient feature interpretation to infer the signature genes of drug resistance mechanisms. Overall, the collective impact of these tools in understanding diverse biological systems and complex diseases could lead to significant advancements in fundamental biology mechanisms and human health.

 


Last Modified: 04/09/2024
Modified by: Qin Ma

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page