
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | July 14, 2017 |
Latest Amendment Date: | June 22, 2022 |
Award Number: | 1718479 |
Award Instrument: | Standard Grant |
Program Manager: |
Almadena Chtchelkanova
achtchel@nsf.gov (703)292-7498 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | July 15, 2017 |
End Date: | June 30, 2023 (Estimated) |
Total Intended Award Amount: | $499,984.00 |
Total Awarded Amount to Date: | $499,984.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
926 DALNEY ST NW ATLANTA GA US 30318-6395 (404)894-4819 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
225 North Avenue Atlanta GA US 30332-0002 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Information Technology Researc, Software & Hardware Foundation |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Next-generation sequencing refers to a collection of high throughput DNA sequencing technologies that have originated about a decade ago, and are now the de facto equipment underpinning all modern genomics studies due to their cost-effectiveness and ubiquity and versatility of use. This project is conducting comprehensive reproducibility and assessment experiments to characterize the state of the art in the field, and make the findings publicly visible and accessible. The project results are expected to become a valuable resource for practitioners, researchers, and the significantly large community of users of next generation sequencing bioinformatics. The project is involving several undergraduate students, and raising awareness of research integrity and reproducibility issues among young researchers.
The project is establishing benchmark datasets to evaluate bioinformatics software for multiple next generation sequencers, multiple types of biological organisms, in multiple application contexts, and at multiple problem scales. The research spans assessment of software products for read error correction, read mapping to target genomes and reference databases, and assembly of genomes and transcriptomes. Reproducibility experiments are conducted to independently verify results of important software products based on results and datasets published in the literature. The software products are also evaluated on a range of metrics - quality of results, robustness and sensitivity to parameter values, run-time performance, memory usage, and ability to process real-world datasets. The project work will result in comprehensive recommendations available to practitioners as well as establishing state of the art to appropriately channel future research efforts.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The project supported research and training in scientific reproducibility, an area that is gaining increasing prominence. The goal of the project is to conduct reproducibility experiments and comprehensive assessment of next generation sequencing data based bioinformatics software, specifically for the problems of error correction, read mapping, gene expression, gene network construction, and multimodal data integration. The types of data studied under the project include both short and long read sequencing, and both DNA sequencing to study genomes and RNA sequencing to study gene expressions. Software for several research tasks are analyzed in the context of applications drawn from pangenomics, systems biology, and single-cell biology.
Intellectual Merits: For each problem area studied, work carried under the project resulted in establishing benchmark datasets and the software was evaluated on a range of metrics including reproducibility, quality of results, robustness and sensitivity to parameter values, run-time performance, memory usage, and ability to process real-world datasets. These results will inform practitioners of the capabilities, limitations, and appropriate ways to use the various software programs on which the studies were conducted. They also inform researchers in the respective areas where future efforts are needed.
Publications resulting from this work themselves earned reproducibility badges, now adopted as a feature by some important conferences and journals. A publication resulting from the project was a finalist for the Best Reproducibility Advancement Award at the Supercomputing 2021 conference, and was selected for the Student Cluster Competition Reproducibiity Challenge.
Research into comprehensive assessment of bioinformatics software and the resulting understanding of ltheir limitations naturally led the project team itself to develop new approaches, algorithms, and software to overcome current limitations and bottlenecks.
Broader Impacts: The project led to peer reviewed publications in conferences and journals, establishment of benchmark datasets, and open source software for evaluation and new methods developed under the project. Software products are made available on GitHub and datasets are made available on Zenodo.
The project supported the training of many undergraduate and graduate students on the important topic of scientific reproducibility. It contributed to the Reproducibility challenge of the Student Cluster Competition, where student teams representing their respective universities from around the world participate and compete.
Last Modified: 05/12/2024
Modified by: Srinivas Aluru
Please report errors in award information by writing to: awardsearch@nsf.gov.