Award Abstract # 0812111
III-CXT-Small: Graphs to Diversity: extracting genomic variation from sequence graphs

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF MARYLAND, COLLEGE PARK
Initial Amendment Date: August 20, 2008
Latest Amendment Date: September 14, 2011
Award Number: 0812111
Award Instrument: Continuing Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2008
End Date: August 31, 2013 (Estimated)
Total Intended Award Amount: $445,359.00
Total Awarded Amount to Date: $1,390,291.00
Funds Obligated to Date: FY 2008 = $293,074.00
FY 2009 = $364,992.00

FY 2010 = $465,950.00

FY 2011 = $266,275.00
History of Investigator:
  • Mihai Pop (Principal Investigator)
    mpop@umd.edu
  • Carleton Kingsford (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Maryland, College Park
3112 LEE BUILDING
COLLEGE PARK
MD  US  20742-5100
(301)405-6269
Sponsor Congressional District: 04
Primary Place of Performance: University of Maryland, College Park
3112 LEE BUILDING
COLLEGE PARK
MD  US  20742-5100
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): NPU8ULVAAS23
Parent UEI: NPU8ULVAAS23
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01000809DB NSF RESEARCH & RELATED ACTIVIT
01000910RB NSF RESEARCH & RELATED ACTIVIT

01001011DB NSF RESEARCH & RELATED ACTIVIT

01001011RB NSF RESEARCH & RELATED ACTIVIT

01001112RB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 170E, 7364, 9215, 9216, HPCC
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Recent advances in genome sequencing technologies have enabled the
sequencing of bacteria directly from the environment, providing a
broader outlook on the diversity of bacteria than ever before
possible. Recent studies of environmental samples have revealed
complex communities containing many previously unknown species, and
uncovered a large amount of genetic variation and diversity even among
closely related strains. Characterizing this genomic variation is
critical in studies of microbial ecology and evolution, yet currently
available computational tools, originally developed for the study of
single organisms, are ill-suited for this task.

This proposal aims to develop the theoretical and computational
infrastructure for the study of genomic variation within mixtures of
organisms. The proposed research relies on both theoretical and
empirical analyses of the structure of genome assembly graphs in order
to characterize graph signatures that are correlated with intra- and
inter- species polymorphisms. A particular focus is placed on
understanding and using the information provided by next generation
sequencing technologies as well as other high-throughput experimental
techniques. The proposed work provides critical analysis tools
to help biologists explore the genetic variation within the
environment.

Additional information about this project is available at
http://www.cbcb.umd.edu/research/Genomic_Variation.shtml.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 15)
Kingsford, C.;Schatz, M. C.;Pop, M.; "Assembly complexity of prokaryotic genomes using short reads" BMC Bioinformatics , v.11 , 2010 , p.21
Lin, H. C.;Goldstein, S.;Mendelowitz, L.;Zhou, S.;Wetzel, J.;Schwartz, D. C.;Pop, M.; "AGORA: Assembly Guided by Optical Restriction Alignment" BMC Bioinformatics , v.13 , 2012 , p.189
Marcais, G.;Kingsford, C.; "A fast, lock-free approach for efficient parallel counting of occurrences of k-mers" Bioinformatics , v.27 , 2011 , p.764-70
Nagarajan, N.;Cook, C.;Di Bonaventura, M.;Ge, H.;Richards, A.;Bishop-Lilly, K. A.;DeSalle, R.;Read, T. D.;Pop, M.; "Finishing genomes with limited resources: lessons from an ensemble of microbial genomes" BMC Genomics , v.11 , 2010 , p.242
Nagarajan, N.;Kingsford, C.; "GiRaF: robust, computational identification of influenza reassortments via graph mining" Nucleic acids research , v.39 , 2011 , p.e34
Nagarajan, N.;Pop, M.; "Parametric complexity of sequence assembly: theory and applications to next generation sequencing" J Comput Biol , v.16 , 2009 , p.897-908
Navlakha, Saket;Schatz, Michael C.;Kingsford, Carl; "Revealing Biological Modules via Graph Summarization" Journal of Computational Biology , v.16 , 2009 , p.253-264
Navlakha, S.;Kingsford, C.; "Network archaeology: uncovering ancient networks from present-day interactions" PLoS computational biology , v.7 , 2011 , p.e1001119
Navlakha, S.;Kingsford, C.; "The power of protein interaction networks for associating genes with diseases" Bioinformatics , v.26 , 2010 , p.1057-63
Pop, M.; "Genome assembly reborn: recent computational challenges" Brief Bioinform , v.10 , 2009 , p.354-66
S. Navlakha, C. Kingsford "The Power of Protein Interaction Networks for Associating Genes with Diseases" Bioinformatics , 2010
(Showing: 1 - 10 of 15)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project covered the development of algorithms for analyzing genome assembly graphs with the goal of uncovering signatures of genomic variation, as found, for example, in a mixture of organisms some of which contain specific genes important for their adaptation to the environment. 

The project has resulted in over 15 publications in peer reviewed journals and conferences, and the related research has contributed to a better understanding on genome assembly and its limitations.  In addition, several software packages were developed and made available open-source to the community, including a novel metagenomic assembly pipeline metAMOS - the only tool that can actually discover genomic variation in metagenomic data.

This project has also directly and indirectly contributed to the training of six graduate students, several of whom have graduated and pursued academic and industry positions.  In addition, this award allowed us to initiate a summer internship program, which is still ongoing, and which has trained over 20 undergraduate and highschool students. 

To summarize, our project has had a direct impact in our field, both through the development of new ideas, algorithms, and software, as well as to biologists who can use the tools we developed.  In addition, our work has had a significant impact in the training of the next generation of scientists at several levels in their academic career, from highschool to post-graduate studies.


Last Modified: 11/01/2013
Modified by: Mihai Pop

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page