Award Abstract # 1421908
III: Small: Reconstructing viral population without using a reference genome

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: THE PENNSYLVANIA STATE UNIVERSITY
Initial Amendment Date: August 5, 2014
Latest Amendment Date: July 19, 2015
Award Number: 1421908
Award Instrument: Continuing Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2014
End Date: May 31, 2017 (Estimated)
Total Intended Award Amount: $500,000.00
Total Awarded Amount to Date: $500,000.00
Funds Obligated to Date: FY 2014 = $71,799.00
FY 2015 = $0.00
History of Investigator:
  • Raj Acharya (Principal Investigator)
  • Mary Poss (Co-Principal Investigator)
  • Paul Medvedev (Co-Principal Investigator)
Recipient Sponsored Research Office: Pennsylvania State Univ University Park
201 OLD MAIN
UNIVERSITY PARK
PA  US  16802-1503
(814)865-1372
Sponsor Congressional District: 15
Primary Place of Performance: The Pennsylvania State University
342B IST Bldg
State College
PA  US  16802-7000
Primary Place of Performance
Congressional District:
Unique Entity Identifier (UEI): NPM2J7MSCF61
Parent UEI:
NSF Program(s): ADVANCES IN BIO INFORMATICS,
Information Technology Researc,
Cross-BIO Activities,
Info Integration & Informatics
Primary Program Source: 01001415DB NSF RESEARCH & RELATED ACTIVIT
01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7364, 7923, 8750
Program Element Code(s): 116500, 164000, 727500, 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Next-generation sequencing (NGS), which allows sampling millions of short DNA sequences from a genome, has revolutionized the field of genomics. One area of particular importance is the reconstruction of genomes (haplotypes) from a viral population, which is a fundamental problem in virology, evolutionary biology, and human health. Though there have been several methods developed to take advantage of NGS data, those are limited to populations for which a reference genome is available. This excludes many important cases, such as RNA viruses or certain HIV/HCV viral populations. In such situations, the haplotypes are sufficiently divergent as to render the reference meaningless. Moreover, most algorithms are not robust in the presence of recombination, which is a common occurrence in many viral populations. The achievement of this project's aims will allow for the full potential of NGS data to be realized in the field of virology. In particular, it will help to propel the understanding of viral population dynamics and give biologists powerful tools to understand disease progression and enable novel treatment and prevention strategies. The algorithms and software developed will be made freely available for use through software sharing platforms like GitHub or Galaxy. The PIs will offer a strong educational component including (a) graduate and undergraduate classes that use the output of the proposed research, and (b) development of a seminar series. The PIs will (a) train future generations of scientists and engineers to enhance and use bioinformatic/genomic cyber resources; (b) facilitate creative, cyber-enabled boundary-crossing collaborations, including those with industry and international dimensions, to advance the frontiers of science and engineering and broaden participation in STEM fields.

This project?s aim is to develop probabilistic De Bruijn graphs and network flow on such graphs for the reconstruction of viral population when a reference is not available. Given NGS data, the algorithms should determine the number, sequences, and relative frequencies of the haplotypes. This project's proposed algorithms are based on a unique combination of established techniques (e.g. maximum likelihood, expectation-maximization, clustering, Lander Waterman statistics) with novel propositions for probabilistic De Bruijn graphs, machine learning, and network flows that are of interest in other applications. The PI and Co-PIs have complementary backgrounds in virology, machine learning, network flow, and genome reconstruction problems.

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page