
NSF Org: |
DBI Division of Biological Infrastructure |
Recipient: |
|
Initial Amendment Date: | July 8, 2014 |
Latest Amendment Date: | December 5, 2017 |
Award Number: | 1356288 |
Award Instrument: | Continuing Grant |
Program Manager: |
Jen Weller
DBI Division of Biological Infrastructure BIO Directorate for Biological Sciences |
Start Date: | July 1, 2014 |
End Date: | December 31, 2018 (Estimated) |
Total Intended Award Amount: | $823,371.00 |
Total Awarded Amount to Date: | $823,371.00 |
Funds Obligated to Date: |
FY 2015 = $313,723.00 FY 2016 = $270,522.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
926 DALNEY ST NW ATLANTA GA US 30318-6395 (404)894-4819 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
505 10th Street, NW Atlanta GA US 30332-0002 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | ADVANCES IN BIO INFORMATICS |
Primary Program Source: |
01001516DB NSF RESEARCH & RELATED ACTIVIT 01001617DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.074 |
ABSTRACT
The genetic diversity of bacteria and archaea (the prokaryotes) is by far the largest among all living organisms. Whether in soils, waters, human guts, or the atmosphere, prokaryotes affect, if not control, all life-sustaining processes on Earth, but how these microbes interact with and change their environment is not fully understood. Current incomplete understanding is, at least in part, due to the fact that the great majority of microorganisms resist cultivation in the laboratory, i.e., they represent the uncultivable majority, and thus, cannot be studied efficiently. In the past few years, there has been an explosion of culture-independent genomic techniques (a.k.a. metagenomics), which allow the analysis of microorganisms and their communities in their natural habitat by sequencing their entire genomes or transcriptomes, bypassing the need for lab cultivation. However, the development of computational tools and algorithms to analyze metagenomic data is lagging behind developments in sequencing technologies. To advance the understanding of the uncultivable majority of microorganisms, and take full advantage of the investment of society in genomic technologies, new quantitative approaches are needed. The goals of this project are: 1) to develop new computational tools that fulfill critical research needs and thus, help scientists understand the composition, functions and values of the microbial communities, and 2) to train faculty from undergraduate colleges, including community colleges, in new metagenomics techniques, which are positioned at the interface of microbiology, genomics, bioinformatics, and computational biology, a pivotal area of contemporary research and education that is inadequately covered in traditional curricula. Therefore, these activities are expected to provide important infrastructure for training the future workforce and to facilitate contemporary research.
The small subunit ribosomal RNA gene (SSU rRNA) has been successfully used to catalogue and study the diversity of microorganisms for the last two decades. This work has been facilitated by the development of dedicated resources (databases and tool repositories) such as the Ribosomal Database Project (RDP; http://rdp.cme.msu.edu). However, rRNA gene-based studies have important limitations that techniques based on genome sequences do not. For instance, the genomic techniques can better resolve microbial communities at the levels where the SSU rRNA gene provides inadequate resolution, namely the species and finer levels, and catalogue whole-genome diversity and fluidity, which are relevant for nutrient cycling, bioremediation efforts, and emergence of microbial antibiotic resistance. This project seeks to develop tools that overcome several of the limitations of the rRNA gene-based approaches and allow the efficient analysis of microbiomes. Robust implementations of both well-accepted existing methods, such as genome-aggregate average nucleotide identity (gANI) for delineating closely-related species and strains, along with newer methods, including the recently developed Nonpareil method for estimating the coverage of a microbial community obtained by a metagenomic dataset, and MyTaxa method for examining horizontal gene transfer events between microbial lineages will be provided. The overarching objective is to develop the genome equivalent of the RDP that will enable the scientific community to perform classification and diversity studies at the genome level.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Intellectual merit:
The small subunit ribosomal RNA gene (16S rRNA) has been successfully used to catalogue and study the diversity of microbial species and their communities to date. Accordingly, several 16S rRNA gene-based websites and tools are available. Nonetheless, several aspects of the rRNA gene-based studies remain problematic. Most importantly, how to better resolve microbial communities at levels where the 16S rRNA gene provides inadequate resolution, namely the species and finer levels, and how to best catalogue whole-genome diversity and fluidity. Additionally, an explosion in the use of culture-independent genomic approaches (a.k.a. metagenomics) has recently occurred. However, the tools to analyze metagenomic data are clearly lagging behind the developments in sequencing technologies (and data) and are typically limited to the assembly and gene annotation of the metagenomic sequences. To advance the study of microbial species and their communities and take full advantage of the capabilities provided by metagenomics, quantitative whole-genome approaches are clearly needed. It is also important for such approaches to scale with high volumes of data in order to accommodate the geometrically increasing number of genomes and metagenomes that become available.
To address these challenges, we developed and released to the public (November 2016) a webserver called the Microbial GenomeAtlas (MiGA; available at www.microbial-genomes.org). MiGA allows one to perform classification and diversity studies of query complete or partial genomes against a reference database of genomes of all isolated and classified microorganisms using the genome-average nucleotide (ANI) and amino-acid identity (AAI) concepts. Therefore, MiGA allows external users to perform classification and diversity studies at the genome level and represents the “genome equivalent” of rRNA webservers. The number of new registered users of the MiGA webserver has grown from a couple per month in 2016, when the server was first launched, to more than 50 new users/month currently (>500 registered users, in total), while the total search queries processed by the webserver has exceeded 8,000, which is a testament that MiGA fulfills a critical need of contemporary research. Furthermore, we have developed new or optimized previously developed algorithms for big data analysis as part of this project. The tools enable various important analyses such as assessment of the extent of species diversity and amount of sequencing required to cover the diversity in an environmental sample (Nonpareil 3), a kmer-based high-throughput algorithm to calculate ANI (FastANI), and tools for detection of target genomes (imGLAD) or reads encoding a gene of interest (ROCker and Xander) in complex metagenomes. Finally, we have applied these tools to obtain answers to questions related to several important microbial systems such as what the microbiome of tick arthropods provides to its host, and how soil microbial communities respond to climate perturbations and agricultural activities. Our bioinformatics approaches and findings were published in 13 articles (Wang 2015, Rodriguez-R 2016, Konstantinidis 2017, Orellana 2017, Orellana 2017, Castro 2018, Jain 2018, Pena-Gonzalez 2018, Rodriguez 2018, Rodriguez 2018, Rodriguez 2018, Tsementzi 2018, Johnston 2019).
Broader impacts:
Our work provided long-needed tools for high-throughput analysis of microbial genomes and metagenomes, and an associated webserver that makes these tools freely available for online analysis. The tools willhelp microbial scientists to significantly advance our understanding of the diversity and function of microbial communities, and are applicable to a variety of microbiome studies across the fields of ecology, systematics, evolution, engineering and medicine. A lecture-based workshop with at least 100 participating faculty from undergraduate colleges, including community colleges, was organized during the 2018 American Society for Microbiology’s Conference for Undergraduate Educators (ASMCUE) that disseminated our tools and knowledge of metagenomics to non-experts and undergraduates. Further, we held workshops with graduate students and their professors to train them on MiGA usage in Atlanta (GA), East Lansing (MI), Puerto Rico, Germany, Greece, China and Brazil. The workshops were met with great success based on participant exit interviews. The project trained 2 post-doctoral associates, 5 Ph.D students, 4 Masters students, and 7 undergraduate computer science and engineering majors; about half of our students were female. All former students and post-docs have won awards and distinctions for their work such as three Sigma Xi best PhD thesis awards in different years (only 8 to 10 such awards are given by the Sigma XI chapter of Georgia Tech per year among all PhD theses published within the year), and secured positions at Institutions such as Emory University, Oak Ridge National Laboratory, and the Max Planck Institute in Bremen, Germany. Therefore, this project provided multifaceted learning experiences to both national and international undergraduate and graduate students at the interface of microbiology, evolution, genomics, bioinformatics, and computational biology, a pivotal area of contemporary research and education.
Last Modified: 01/18/2019
Modified by: Konstantinos T Konstantinidis
Please report errors in award information by writing to: awardsearch@nsf.gov.