
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 5, 2014 |
Latest Amendment Date: | July 19, 2015 |
Award Number: | 1421908 |
Award Instrument: | Continuing Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2014 |
End Date: | May 31, 2017 (Estimated) |
Total Intended Award Amount: | $500,000.00 |
Total Awarded Amount to Date: | $500,000.00 |
Funds Obligated to Date: |
FY 2015 = $0.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
201 OLD MAIN UNIVERSITY PARK PA US 16802-1503 (814)865-1372 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
342B IST Bldg State College PA US 16802-7000 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
ADVANCES IN BIO INFORMATICS, Information Technology Researc, Cross-BIO Activities, Info Integration & Informatics |
Primary Program Source: |
01001516DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Next-generation sequencing (NGS), which allows sampling millions of short DNA sequences from a genome, has revolutionized the field of genomics. One area of particular importance is the reconstruction of genomes (haplotypes) from a viral population, which is a fundamental problem in virology, evolutionary biology, and human health. Though there have been several methods developed to take advantage of NGS data, those are limited to populations for which a reference genome is available. This excludes many important cases, such as RNA viruses or certain HIV/HCV viral populations. In such situations, the haplotypes are sufficiently divergent as to render the reference meaningless. Moreover, most algorithms are not robust in the presence of recombination, which is a common occurrence in many viral populations. The achievement of this project's aims will allow for the full potential of NGS data to be realized in the field of virology. In particular, it will help to propel the understanding of viral population dynamics and give biologists powerful tools to understand disease progression and enable novel treatment and prevention strategies. The algorithms and software developed will be made freely available for use through software sharing platforms like GitHub or Galaxy. The PIs will offer a strong educational component including (a) graduate and undergraduate classes that use the output of the proposed research, and (b) development of a seminar series. The PIs will (a) train future generations of scientists and engineers to enhance and use bioinformatic/genomic cyber resources; (b) facilitate creative, cyber-enabled boundary-crossing collaborations, including those with industry and international dimensions, to advance the frontiers of science and engineering and broaden participation in STEM fields.
This project?s aim is to develop probabilistic De Bruijn graphs and network flow on such graphs for the reconstruction of viral population when a reference is not available. Given NGS data, the algorithms should determine the number, sequences, and relative frequencies of the haplotypes. This project's proposed algorithms are based on a unique combination of established techniques (e.g. maximum likelihood, expectation-maximization, clustering, Lander Waterman statistics) with novel propositions for probabilistic De Bruijn graphs, machine learning, and network flows that are of interest in other applications. The PI and Co-PIs have complementary backgrounds in virology, machine learning, network flow, and genome reconstruction problems.
Please report errors in award information by writing to: awardsearch@nsf.gov.