
NSF Org: |
DBI Division of Biological Infrastructure |
Recipient: |
|
Initial Amendment Date: | February 16, 2021 |
Latest Amendment Date: | May 19, 2021 |
Award Number: | 1945971 |
Award Instrument: | Standard Grant |
Program Manager: |
Jen Weller
DBI Division of Biological Infrastructure BIO Directorate for Biological Sciences |
Start Date: | March 1, 2021 |
End Date: | February 29, 2024 (Estimated) |
Total Intended Award Amount: | $299,999.00 |
Total Awarded Amount to Date: | $299,999.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1960 KENNY RD COLUMBUS OH US 43210-1016 (614)688-8735 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1960 Kenny Road Columbus OH US 43210-1016 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Innovation: Bioinformatics |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.074 |
ABSTRACT
Single-cell RNA-Seq (scRNA-Seq) analyses have revolutionized the methods in which researchers can investigate tissue samples of specific cell types. While single-cell sequencing technologies have provided a new frontier for researchers, they also come with a complex set of problems. One of these problems is related to the quality of gene expression estimates, which are used in numerous downstream analyses from the prediction of the cell types/trajectories to determining differentially expressed genes between cell types or tissues. The low coverage and sequencing inefficiencies can affect up to 90% of gene expression estimates for scRNA-Seq studies, and hence, are challenging to overcome. However, there are two critical problems in the way that current methods attempt to address this problem: (1) inadequate use of bulk data to compensate for low expression genes and (2) under-utilization of iterative procedures to optimize highly-connected steps for imputation of gene expression estimates.
This project will develop a novel computational framework to integrate bulk RNA-seq data into scRNA-seq data modeling and analyses, aiming at accurate gene expression estimates from the sparse scRNA-Seq data, and high quality, reliability, and precision of downstream analyses. The aim is to model particular features of the heterogeneous gene expression patterns among various cell types. Integration of bulk RNA-Seq data through de-convolution will be used to develop heterogeneous compensation distributions and probabilities. Utilization of the gamma distribution to determine empirical distribution for single-cell gene expression estimates will improve the baseline expression in a specific cell type and identify estimates of interest through the high level of noise in sequencing data, which will then be combined with compensation information from bulk RNA-Seq data to correct biases from the high noise scRNA-Seq data. Finally, the updated expression estimate will be used to iterate back through the process to provide improved results for each stage of the process. The outcome will be a novel imputation framework that should enable scRNA-Seq expression estimates through the integration of the above three new characteristics.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Project Outcomes Report
The overarching goal of the project is to mathematically model transcriptional regulatory signals and their associated co-regulation gene modules through transcriptomic profiles of single cells. Furthermore, it seeks to identify and annotate the gene signatures for each signal and estimate the level of each signal in independent tissue data. Another major goal of this project is to enhance the accuracy, efficiency, and versatility of gene imputation, cell clustering, and cell-cell interaction interpretation by integrating information from both bulk and single-cell transcriptomic data. Throughout the project period, several significant activities were undertaken. Firstly, the development and enhancement of scGNN, a specialized tool for single-cell RNA-Seq analysis, proved pivotal in overcoming challenges related to sequencing sparsity and provided insights into diverse biological systems. This was followed by the introduction of scGNN 2.0, which expanded capabilities for imputation and clustering while integrating advanced visualization features and optimization for faster performance. The primary objective of scGNN 2.0 was to refine gene expression estimates from scRNA-seq analyses using bulk RNA-seq data, achieved through the development of an iterative GNN-based scRNA-Seq imputation model. We also developed scGNN 3.0 with the implementation of a large language model to be fine-tuned for question-and-answer guidelines, automatic code generation and debugging, and interactive result interpretation. Additionally, the developed R package, IRIS-FGM, provided efficient identification of functional gene modules, compatibility with existing pipelines like Seurat, and overcame limitations of previous implementations, thus advancing scRNA-Seq analysis capabilities significantly. Lastly, we delivered scDEAL, a deep transfer learning framework for cancer drug response prediction at the single-cell level by integrating large-scale bulk cell-line data. The highlight in scDEAL involves harmonizing drug-related bulk RNA-seq data with scRNA-seq data and transferring the model trained on bulk RNA-seq data to predict drug responses in scRNA-seq. Also, scDEAL uses integrated gradient feature interpretation to infer the signature genes of drug resistance mechanisms. Overall, the collective impact of these tools in understanding diverse biological systems and complex diseases could lead to significant advancements in fundamental biology mechanisms and human health.
Last Modified: 04/09/2024
Modified by: Qin Ma
Please report errors in award information by writing to: awardsearch@nsf.gov.