
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | February 20, 2024 |
Latest Amendment Date: | January 21, 2025 |
Award Number: | 2336469 |
Award Instrument: | Continuing Grant |
Program Manager: |
Sorin Draghici
sdraghic@nsf.gov (703)292-2232 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | March 1, 2024 |
End Date: | February 28, 2029 (Estimated) |
Total Intended Award Amount: | $600,000.00 |
Total Awarded Amount to Date: | $348,000.00 |
Funds Obligated to Date: |
FY 2025 = $150,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
9500 GILMAN DR LA JOLLA CA US 92093-0021 (858)534-4896 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
9500 GILMAN DRIVE LA JOLLA CA US 92093-0021 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
01002526DB NSF RESEARCH & RELATED ACTIVIT 01002627DB NSF RESEARCH & RELATED ACTIVIT 01002728DB NSF RESEARCH & RELATED ACTIVIT 01002829DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
DNA mutations have a profound effect on how genes work, but it?s still not well understood which mutations affect which genes. Currently, our knowledge is limited due to challenges in analyzing genomics data, such as bias arising from an overrepresentation of European study participants and simplistic statistical models that do not sufficiently capture the data. This project overcomes these challenges across three main scientific goals, in which innovative statistical models map DNA mutations to their target genes, and two educational goals, in which scientific training and diversity are simultaneously cultivated. First, the investigators will improve the fidelity of mapping mutations to target genes for groups of individuals that are not well-studied, such as minority populations. Second, the investigators will develop a new method to connect mutations to genes by considering how genes interact with each other in genome-wide networks, suggesting functional effects for many uncharacterized mutations. Third, the investigators will characterize the specific cells in which mutations exert their effects using scalable models that reflect the natural distribution of data from single cell genomic assays. This research advances the fields of bioinformatics and human genetics by introducing new, robust statistical models that link mutations to their target genes. This project also enhances equity and diversity in biomedical discoveries, while simultaneously enhancing diversity within research environments. Toward the latter, the investigators initiate a multi-week on-campus research program for high school students from under-resourced communities, as well as genetics training courses for undergraduate and graduate students, supplying quantitative interdisciplinary skills coveted by industry and academia alike. This award will generate extensive datasets, open-source statistical models and genomics tools, high-impact publications, and course materials, thereby engaging and fueling the scientific community to partake and propel related research.
This project focuses on developing new genetic models to understand how specific genetic variations influence gene expression. These models overcome current limitations in characterizing the function of genetic variation, which often has the subsequent goal of implicating target genes in the regulation of human phenotypes such as height and cancer risk. Challenges of existing algorithms include statistical issues due to finite sample sizes (especially for understudied minority populations), multiple hypothesis burdens restricting the knowledge gained from genome-wide analysis, and model misspecification especially for new datatypes of growing popularity, such as single cell genomics. The investigators address these challenges across three main objectives. First, the investigators link genetic variation to changes in gene expression in understudied minority populations, by jointly modeling genetic associations across globally diverse datasets. Second, the investigators develop a comprehensive approach to map genome-wide genetic variants to changes in gene expression using a priori knowledge of gene regulatory networks and advanced machine learning algorithms to reduce the burden of multiple testing. Third, the investigators design a new statistical model to characterize the cell-type-specificity of gene expression regulation at high resolution; this model leverages the natural distribution of single cell data, resolving model misspecification of state-of-the-art methods and reduces measurement noise by modeling millions of single cell measurements across donors. This award supports the generation of open-source genomics software and data repositories characterizing the function of genetic variants, while also creating educational and training opportunities for under-resourced high school students and motivated undergraduate and graduate students. The symbiotic research and educational intertwine in a relationship that is expected to enhance both the diversity in research environments, as well as the diversity in research cohorts.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Please report errors in award information by writing to: awardsearch@nsf.gov.