
NSF Org: |
DBI Division of Biological Infrastructure |
Recipient: |
|
Initial Amendment Date: | July 13, 2018 |
Latest Amendment Date: | February 7, 2020 |
Award Number: | 1759462 |
Award Instrument: | Standard Grant |
Program Manager: |
Reed Beaman
rsbeaman@nsf.gov (703)292-7163 DBI Division of Biological Infrastructure BIO Directorate for Biological Sciences |
Start Date: | July 15, 2018 |
End Date: | June 30, 2023 (Estimated) |
Total Intended Award Amount: | $283,181.00 |
Total Awarded Amount to Date: | $283,181.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
2935 RODEO PARK DR E SANTA FE NM US 87505-6303 (505)982-7840 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
2935 Rodeo Park Drive East Santa Fe NM US 87505-6303 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | ADVANCES IN BIO INFORMATICS |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.074 |
ABSTRACT
This project develops new software tools for pangenomic analysis, which is a relatively new area of genomic research that studies large numbers of genome sequences from multiple organisms to understand how organisms adapt their genomes to their environments. As the cost of DNA sequencing continues to decrease, it is now routine for multiple genomes per species to be available for analysis, giving much more information about the species. The approach makes use of a graph-based representation of a pangenome and exploits this representation to efficiently find both shared and unique regions of interest across genomes. Each individual?s genomic sequence corresponds to path in a graph data structure called a De Bruijn graph; these graphs are large and can have millions of nodes and edges. The tools being developed are based on finding frequented regions (FRs) in De Bruijn graphs; these regions are hotspots that often represent features of interest in one or more genomes. Algorithms and software tools will be made available to the greater scientific community to facilitate new pangenomics research. The project will provide support and training for a postdoc and an incoming PhD student at Montana State University. It will also support a summer intern in the last two years at the National Center for Genome Resources. Aspects of the project will be incorporated into undergraduate and graduate courses at MSU, as well as integrated into several outreach and training activities at NCGR. In addition, MSU has several programs in place to serve American Indian students and the PIs will actively recruit from and engage this community.
The current trajectory of next generation sequencing improvements, including falling costs and increased read lengths and throughput, ensure that multiple genomes per species will be routine within the next decade. This project initiates work on a next generation of bioinformatics software that can exploit the increased information content available from multiple accessions and intelligently use the data for unbiased, species-wide analyses. The proposed work will refine algorithms and develop software to address important problems in each of the identified areas. The research team has a variety of complementary expertise ranging from molecular biology, algorithms, machine learning and genomics research. Pangenomic biology will be advanced through automatic identification of candidate regions of interest in a pangenome. Methods will be developed to discover regions that are conserved across evolutionary space, regions that are novel, and regions that have diverged due to positive selection. Machine learning techniques will be used to search for interesting genomic regions. Lastly, this work will complement the work being done on the model plant, Medicago truncatula, contributing to research on its symbiotic relationships. Results of the project can be found at: www.cs.montana.edu/pangenomics.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Improvements in DNA sequencing technologies have allowed us to look at genes and genomes not only in many different species but also across many individuals in the same species. Our tools provide a way to efficiently compare full sets of genetic information across many related individuals to identify genes that are shared or specific to one or a few individuals. Shared genes are important because they are required for the species' survival. Novel genes are important because they help individuals to adapt to specific environments or climates. Understanding how genes help organisms adapt is especially important as we try to understand how organisms can adapt to climate change and resist expanding disease pressure, helping us to improve our food security and preserve species facing extinction. Our tools work across the tree of life for any organism with genes coded by DNA or RNA. Our project has trained a recent PhD graduate and multiple graduate, undergraduate, and even high school students. We have created educational curricula that have extended our reach, including creating and teaching pangenomics workshops to graduate students and researchers and generating a case study for high school and undergraduate students that we have used with multiple student groups. These circula help students to understand how pangenomics (concurrent analysis of multiple genomes from related individuals) can help ask and answer important questions about genetic diversity, climate change, and species' survival.
Last Modified: 08/18/2023
Modified by: Joann Mudge
Please report errors in award information by writing to: awardsearch@nsf.gov.