Award Abstract # 1565057
Collaborative Research: ABI Development: Extensible, reproducible and documentation-driven microbiome data science

NSF Org: DBI
Division of Biological Infrastructure
Recipient: UNIVERSITY OF CALIFORNIA, SAN DIEGO
Initial Amendment Date: March 29, 2016
Latest Amendment Date: March 29, 2016
Award Number: 1565057
Award Instrument: Standard Grant
Program Manager: Peter McCartney
DBI
 Division of Biological Infrastructure
BIO
 Directorate for Biological Sciences
Start Date: May 1, 2016
End Date: April 30, 2019 (Estimated)
Total Intended Award Amount: $329,606.00
Total Awarded Amount to Date: $329,606.00
Funds Obligated to Date: FY 2016 = $329,606.00
History of Investigator:
  • Rob Knight (Principal Investigator)
    robknight@ucsd.edu
Recipient Sponsored Research Office: University of California-San Diego
9500 GILMAN DR
LA JOLLA
CA  US  92093-0021
(858)534-4896
Sponsor Congressional District: 50
Primary Place of Performance: University of California-San Diego
CA  US  92093-0934
Primary Place of Performance
Congressional District:
50
Unique Entity Identifier (UEI): UYTTZT6G9DT1
Parent UEI:
NSF Program(s): ADVANCES IN BIO INFORMATICS
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 116500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

Single-cellular organisms (microbes) represent a vast component of the diversity of life on Earth and perform an amazing array of biological functions. They rarely live or act alone and instead exist in complex communities composed of many interacting species that make up the microbiome. This award supports the development of the next generation of Quantitative Insights Into Microbial Ecology (QIIME, pronounced "chime"), a free and open source software platform for analyzing microbiomes based on DNA sequencing data. Microbiome science is in a transformation from being descriptive and technically challenging, to becoming hypothesis-driven, actionable, and technically straight-forward, in part enabled by QIIME. We now know that the traditional approach for studying microbial communities, which relied on culturing microbes in the lab, is insufficient because we don't know the conditions required for the growth of most microbes. Recent advances link microbiomes to functional processes via 'culture independent' techniques, such as sequencing fragments of microbial genomes, and then using those fragments as 'molecular fingerprints' to profile the microbiome. The bottleneck in microbiome analysis is not DNA sequencing, but in interpreting the large quantities of sequence data generated. QIIME 2 will advance knowledge of microbiomes by helping users derive insight through interactive exploratory analysis capabilities, understand the underlying methods, and report their results in ways accessible to end users from outside of the field, including physicians, engineers and policymakers who urgently need access to conclusions drawn from studies of complex microbial ecosystems. Societal benefits range from global to personal (from understanding cycling of biologically essential nutrients, such as carbon and nitrogen in the environment to curing disease, including obesity and cancer). QIIME has been cited over 4,000 times and has active user and developer communities. Educational workshops on QIIME are taught approximately monthly in the USA and around the world.

At its core, QIIME 2 will provide a stable application programming interface (API) relying on existing community standards for documentation, coding style, and testing. It will have a novel 'documentation-driven' graphical user interface that will make QIIME accessible to users without requiring advanced computational skills. At the same time, it will help users improve their computational skills through exposure to the underlying bioinformatics methods. QIIME 2 will have fully integrated provenance tracking, which will simplify reporting and the reproducibility of bioinformatics workflows. A first-class plugin system will decentralize development by allowing outside developers to add new methods to the QIIME 2 platform. The API will also support improved integration of QIIME as a component of other widely used systems, such as Illumina BaseSpace® and Qiita, and an automatically generated command line interface will be provided for power users. QIIME 2 will have a completely redeveloped parallel framework, which will support deployment on diverse high-performance computing resources, from locally owned and operated computer clusters to commercially available cloud computing platforms. All stages of QIIME 2 development will be driven by user community input through the QIIME Forum (currently over 2500 active users) and our public GitHub repository. Further details on this project are on the QIIME website (www.qiime.org).

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Zech Xu Z,Kightley EP, Thompson LR, Hyde ER, Gonzalez A, Knight R. "Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns" mSystems , 2017
Blaser MJ, Cardon ZG, Cho MK, Dangl JL, Donohue TJ, Green JL, Knight R, Maxon ME, Northen TR, Pollard KS, Brodie EL. "Toward a Predictive Understanding of Earth's Microbiomes to Address 21st Century Challenges" mBio , 2016
Bokulich, NA, Kaehler, BD, Rideout, JR, Dillon, M, Bolyen, E, Knight, R, Huttley, GA, Caporaso, JG. "taxonomic classification of marker gene sequences" Microbiome , 2018
Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA,Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K,Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener "Reproducible, interactive, scalable and extensible microbiomedata science using QIIME 2" Nat Biotech , 2019
Gonzalez A, Navas-Molina JA, Kosciolek T, McDonald D, Vázquez-Baeza Y, Ackermann G, DeReus J, Janssen S, Swafford AD, Orchanian SB, Sanders JG, Shorenstein J, Holste H, Petrus S, Robbins-Pianka A, Brislawn CJ, Wang M, Rideout JR, Bolyen E, Dillon M, Capor "Qiita: rapid, web-enabled microbiome meta-analysi" Nat Meth , 2018
Janssen S, McDonald D, Gonzalez A, Navas-Molina JA, Jiang L, Xu ZZ, Winker K, Kado DM, Orwoll E, Manary M, Mirarab S, Knight R. "Phylogenetic Placement of Exact Amplicon Sequences Improves Associations with Clinical Information" mSystems , 2018
Jiang L, Amir A, Morton JT, Heller R, Arias-Castro E, Knight R. "Discrete False-Discovery Rate Improves Identification of Differentially Abundant Microbes." mSystems , 2018
Morton JT, Marotz C, Washburne A, Silverman J, Zaramela LS, Edlund A, Zengler K, Knight R. "Establishing microbial composition measurement standards with reference frames" Nat Communications , 2019
Rideout JR, Chase JH, Bolyen E, Ackermann G, González A, Knight R, Caporaso JG. "Keemei: cloudbasedvalidationof tabular bioinformatics file formats in Google Sheets" Gigascience , 2016

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

In this collaborative project, we developed a graphical user interface for microbiome research and coupled it to a massive database of existing studies. Briefly, the UCSD component of this research focused on development of the database and web interface components, and integration with the software components developed at Northern Arizona University to produce a method by which users could use a web interface to analyze microbiome data and integrate their own studies with hundreds of thousands of microbiome samples deposited by other researchers.

The Intellectual Merit of this work was to provide a database called Qiita, available at http://qiita.ucsd.edu/, that integrates the QIIME2 software as a processing component and allows users to upload their microbiome data, process it on the web, combine it with other datasets, and perform statistical tests and visualization displays using the UCSD cluster. This process makes it much easier to perform microbiome analyses than the previous method of performing these analyses from the command line. Additionally, we developed several new statistical methods that substantially improve our ability to understand which specific microbes or patterns of microbes are contributing to differences among samples, such as separating samples from healthy and diseased subjects. Additionally, UCSD personnel contributed to software development, code review, and tresting of the QIIME2 software itself, to teaching workshops, and to preparing documentation. Of particular note, we introduced software called "deblur", which allows higher resolution use of a particular kind of microbiome data called amplicon data, an improved method for calculating the false discovery rate of microbiome studies, and an improved method for compositional data analysis in microbiome studies, all of which were published in scientific journals, as well as contributing to the main scientific articles on QIIME2 and Qiita. These main papers were published in highly selective scientific journals (Nature Biotechnology and Nature Methods respectively), underscoring the rigor of the peer review process and the importance of these methods for the field.

 

The Broader Impacts of this project included training hundreds of students, postdocs and faculty in how to use the software through workshops held at UCSD and around the world, enabling industry partners in the food and pharmaceutical industry to use the software as part of their own workflows and/or products, improved provenance tracking so that scientists can more effectively reproduce each other's studies, and supporting hundreds of scientific articles uncovering the role of the microbiome in studies spanning geology, oceanography, medicine, food, and ecology. Consequently, the software has enabled a large community of different kinds of scientists to explore the microbial aspects of their projects using a consistent, reproducible workflow that faciltates data exchange and re-use and that allows scientists to verify each other's results after publication, as well as to combine their datasets with one another rather than performing their research in isolation.


Last Modified: 02/24/2020
Modified by: Rob Knight

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page