Award Abstract # 1759462
Collaborative Research: Innovation: Pioneering New Approaches to Explore Pangenomic Space at Scale

NSF Org: DBI
Division of Biological Infrastructure
Recipient: NATIONAL CENTER FOR GENOME RESOURCES
Initial Amendment Date: July 13, 2018
Latest Amendment Date: February 7, 2020
Award Number: 1759462
Award Instrument: Standard Grant
Program Manager: Reed Beaman
rsbeaman@nsf.gov
 (703)292-7163
DBI
 Division of Biological Infrastructure
BIO
 Directorate for Biological Sciences
Start Date: July 15, 2018
End Date: June 30, 2023 (Estimated)
Total Intended Award Amount: $283,181.00
Total Awarded Amount to Date: $283,181.00
Funds Obligated to Date: FY 2018 = $283,181.00
History of Investigator:
  • Joann Mudge (Principal Investigator)
    jm@ncgr.org
  • Thiruvarangan Ramaraj (Co-Principal Investigator)
Recipient Sponsored Research Office: National Center for Genome Resources
2935 RODEO PARK DR E
SANTA FE
NM  US  87505-6303
(505)982-7840
Sponsor Congressional District: 03
Primary Place of Performance: National Center for Genome Resources
2935 Rodeo Park Drive East
Santa Fe
NM  US  87505-6303
Primary Place of Performance
Congressional District:
03
Unique Entity Identifier (UEI): ET8RBMXCF117
Parent UEI:
NSF Program(s): ADVANCES IN BIO INFORMATICS
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 9150
Program Element Code(s): 116500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

This project develops new software tools for pangenomic analysis, which is a relatively new area of genomic research that studies large numbers of genome sequences from multiple organisms to understand how organisms adapt their genomes to their environments. As the cost of DNA sequencing continues to decrease, it is now routine for multiple genomes per species to be available for analysis, giving much more information about the species. The approach makes use of a graph-based representation of a pangenome and exploits this representation to efficiently find both shared and unique regions of interest across genomes. Each individual?s genomic sequence corresponds to path in a graph data structure called a De Bruijn graph; these graphs are large and can have millions of nodes and edges. The tools being developed are based on finding frequented regions (FRs) in De Bruijn graphs; these regions are hotspots that often represent features of interest in one or more genomes. Algorithms and software tools will be made available to the greater scientific community to facilitate new pangenomics research. The project will provide support and training for a postdoc and an incoming PhD student at Montana State University. It will also support a summer intern in the last two years at the National Center for Genome Resources. Aspects of the project will be incorporated into undergraduate and graduate courses at MSU, as well as integrated into several outreach and training activities at NCGR. In addition, MSU has several programs in place to serve American Indian students and the PIs will actively recruit from and engage this community.



The current trajectory of next generation sequencing improvements, including falling costs and increased read lengths and throughput, ensure that multiple genomes per species will be routine within the next decade. This project initiates work on a next generation of bioinformatics software that can exploit the increased information content available from multiple accessions and intelligently use the data for unbiased, species-wide analyses. The proposed work will refine algorithms and develop software to address important problems in each of the identified areas. The research team has a variety of complementary expertise ranging from molecular biology, algorithms, machine learning and genomics research. Pangenomic biology will be advanced through automatic identification of candidate regions of interest in a pangenome. Methods will be developed to discover regions that are conserved across evolutionary space, regions that are novel, and regions that have diverged due to positive selection. Machine learning techniques will be used to search for interesting genomic regions. Lastly, this work will complement the work being done on the model plant, Medicago truncatula, contributing to research on its symbiotic relationships. Results of the project can be found at: www.cs.montana.edu/pangenomics.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Cleary, Alan and Ramaraj, Thiruvarangan and Kahanda, Indika and Mudge, Joann and Mumey, Brendan "Exploring Frequented Regions in Pan-Genomic Graphs" IEEE/ACM Transactions on Computational Biology and Bioinformatics , 2019 10.1109/TCBB.2018.2864564 Citation Details
Manuweera, Buwani and Mudge, Joann and Kahanda, Indika and Mumey, Brendan and Ramaraj, Thiruvarangan and Cleary, Alan "Pangenome-Wide Association Studies with Frequented Regions" Pangenome-Wide Association Studies with Frequented Regions , 2019 https://doi.org/10.1145/3307339.3343478 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Improvements in DNA sequencing technologies have allowed us to look at genes and genomes not only in many different species but also across many individuals in the same species. Our tools provide a way to efficiently compare full sets of genetic information across many related individuals to identify genes that are shared or specific to one or a few individuals. Shared genes are important because they are required for the species' survival. Novel genes are important because they help individuals to adapt to specific environments or climates. Understanding how genes help organisms adapt is especially important as we try to understand how organisms can adapt to climate change and resist expanding disease pressure, helping us to improve our food security and preserve species facing extinction. Our tools work across the tree of life for any organism with genes coded by DNA or RNA. Our project has trained a recent PhD graduate and multiple graduate, undergraduate, and even high school students. We have created educational curricula that have extended our reach, including creating and teaching pangenomics workshops to graduate students and researchers and generating a case study for high school and undergraduate students that we have used with multiple student groups. These circula help students to understand how pangenomics (concurrent analysis of multiple genomes from related individuals) can help ask and answer important questions about genetic diversity, climate change, and species' survival.

 


Last Modified: 08/18/2023
Modified by: Joann Mudge

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page