Award Abstract # 1350041
CAREER: Algorithms for single molecule sequence analysis
NSF Org: |
DBI
Division of Biological Infrastructure
|
Recipient: |
COLD SPRING HARBOR LABORATORY
|
Initial Amendment Date:
|
June 7, 2014 |
Latest Amendment Date:
|
May 26, 2015 |
Award Number: |
1350041 |
Award Instrument: |
Continuing Grant |
Program Manager: |
Jen Weller
DBI
Division of Biological Infrastructure
BIO
Directorate for Biological Sciences
|
Start Date: |
June 1, 2014 |
End Date: |
April 30, 2016 (Estimated) |
Total Intended Award
Amount: |
$1,534,349.00 |
Total Awarded Amount to
Date: |
$599,600.00 |
Funds Obligated to Date:
|
FY 2014 = $293,574.00
FY 2015 = $49,920.00
|
History of Investigator:
|
-
Michael
Schatz
(Principal Investigator)
michael.schatz@gmail.com
|
Recipient Sponsored Research
Office: |
Cold Spring Harbor Laboratory
1 BUNGTOWN RD
COLD SPG HBR
NY
US
11724-2202
(516)367-8307
|
Sponsor Congressional
District: |
03
|
Primary Place of
Performance: |
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor
NY
US
11724-2209
|
Primary Place of
Performance Congressional District: |
03
|
Unique Entity Identifier
(UEI): |
GV31TMFLPY88
|
Parent UEI: |
|
NSF Program(s): |
ADVANCES IN BIO INFORMATICS
|
Primary Program Source:
|
01001415DB NSF RESEARCH & RELATED ACTIVIT
01001516DB NSF RESEARCH & RELATED ACTIVIT
01001617DB NSF RESEARCH & RELATED ACTIVIT
01001718DB NSF RESEARCH & RELATED ACTIVIT
01001819DB NSF RESEARCH & RELATED ACTIVIT
|
Program Reference
Code(s): |
1045,
9179,
9251
|
Program Element Code(s):
|
116500
|
Award Agency Code: |
4900
|
Fund Agency Code: |
4900
|
Assistance Listing
Number(s): |
47.074
|
ABSTRACT

The Cold Spring Harbor Laboratory is awarded a CAREER grant for the PI Michael Schatz to develop new computational methods for processing DNA sequencing data from the latest high-throughput sequencing technologies. DNA sequencing costs and throughput have improved by orders of magnitudes over the last three decades, although many questions remain unsolved, especially because of the short sequence lengths currently available. Emerging "third generation" sequencing technology from Pacific Biosciences, Moleculo, Oxford Nanopore, and other companies are poised to revolutionize genomics by enabling the sequencing of long, individual molecules of DNA and RNA. The sequence lengths with these technologies can reach up to tens of thousands of nucleotides, however few or no analysis packages are capable of dealing with these types of genetic sequence data. This project will overcome these limitations by developing several novel analysis algorithms specifically for long read single molecule sequencing and their associated complex error models. The outcomes will help answer biological questions of profound significance to all of society, such as: What were the genetic implications of the domestication of rice? What genes and regulatory elements give rise to the incredible regenerative properties of the flatworm? or, What can be understood from assembling reference genomes of sugarcane and pineapple towards breeding more robust plant crops and biofuels?
Specific objectives of the research include working towards assembling entire plant and animal chromosomes into complete, haplotype-phased sequences; identifying fusion genes and complex alternative splicing patterns responsible for diseases or adaptability; and searching for structural variations associated with improved crop yield or human diseases such as cancer or autism. Even if some future technology is capable of directly reading entire transcripts or entire genomes, this research will remain necessary to examine the higher level relationships across populations of genomes or in measuring the dynamics of gene expression and splicing.
This project will tightly integrate research and education, promoting opportunities at high school through postdoctoral levels with the development of new course materials, hands-on research opportunities, and one-on-one mentoring experiences. This effort will specifically target the intersection of computer science and biology, promoting interdisciplinary education, and ensuring the next generation of scientists are ready for the complexities of quantitative and digital biology. To engage the widest possible audience, Dr Schatz will also develop novel online teaching materials made available through a yearly bioinformatics contest. The first round of the contest reached nearly 1000 students around the world and at all levels of education, engaging students far beyond our physical limits. The products of the research will be made available as open-source software, and installed into the graphical iPlant Discovery Environment making them easily accessible to the large community of plant researcher around the world.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
(Showing: 1 - 10 of 32)
(Showing: 1 - 32 of 32)
Aganezov, Sergey and Goodwin, Sara and Sherman, Rachel M. and Sedlazeck, Fritz J. and Arun, Gayatri and Bhatia, Sonam and Lee, Isac and Kirsche, Melanie and Wappel, Robert and Kramer, Melissa and Kostroff, Karen and Spector, David L. and Timp, Winston and
"Comprehensive analysis of structural variants in breast cancer genomes using single-molecule sequencing"
Genome Research
, v.30
, 2020
https://doi.org/10.1101/gr.260497.119
Citation
Details
Aganezov, Sergey and Yan, Stephanie M. and Soto, Daniela C. and Kirsche, Melanie and Zarate, Samantha and Avdeyev, Pavel and Taylor, Dylan J. and Shafin, Kishwar and Shumate, Alaina and Xiao, Chunlin and Wagner, Justin and McDaniel, Jennifer and Olson, Na
"A complete reference genome improves analysis of human genetic variation"
Science
, v.376
, 2022
https://doi.org/10.1126/science.abl3533
Citation
Details
Alonge, Michael and Soyk, Sebastian and Ramakrishnan, Srividya and Wang, Xingang and Goodwin, Sara and Sedlazeck, Fritz J. and Lippman, Zachary B. and Schatz, Michael C.
"RaGOO: fast and accurate reference-guided scaffolding of draft genomes"
Genome Biology
, v.20
, 2019
https://doi.org/10.1186/s13059-019-1829-6
Citation
Details
Alonge, Michael and Wang, Xingang and Benoit, Matthias and Soyk, Sebastian and Pereira, Lara and Zhang, Lei and Suresh, Hamsini and Ramakrishnan, Srividya and Maumus, Florian and Ciren, Danielle and Levy, Yuval and Harel, Tom Hai and Shalev-Schlosser, Gil
"Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato"
Cell
, 2020
https://doi.org/10.1016/j.cell.2020.05.021
Citation
Details
Altemose, Nicolas and Logsdon, Glennis A. and Bzikadze, Andrey V. and Sidhwani, Pragya and Langley, Sasha A. and Caldas, Gina V. and Hoyt, Savannah J. and Uralsky, Lev and Ryabov, Fedor D. and Shew, Colin J. and Sauria, Michael E. and Borchers, Matthew an
"Complete genomic and epigenetic maps of human centromeres"
Science
, v.376
, 2022
https://doi.org/10.1126/science.abl4178
Citation
Details
Chen, Li-Yu Man and VanBuren, Robert Young and Paris, Margot C. and Zhou, Hongye C. and Zhang, Xingtan E. and Wai, Ching M. and Yan, Hansong D. and Chen, Shuai C. and Alonge, Michael L. and Ramakrishnan, Srividya and Liao, Zhenyang and Liu, Juan and Lin,
"The bracteatus pineapple genome and domestication of clonally propagated crops"
Nature Genetics
, v.51
, 2019
10.1038/s41588-019-0506-8
Citation
Details
Chen, Sai and Krusche, Peter and Dolzhenko, Egor and Sherman, Rachel M. and Petrovski, Roman and Schlesinger, Felix and Kirsche, Melanie and Bentley, David R. and Schatz, Michael C. and Sedlazeck, Fritz J. and Eberle, Michael A.
"Paragraph: a graph-based structural variant genotyper for short-read sequence data"
Genome Biology
, v.20
, 2019
https://doi.org/10.1186/s13059-019-1909-7
Citation
Details
Chou, Hsiang-Chen and Bhalla, Kuhulika and Demerdesh, Osama EL and Klingbeil, Olaf and Hanington, Kaarina and Aganezov, Sergey and Andrews, Peter and Alsudani, Habeeb and Chang, Kenneth and Vakoc, Christopher R and Schatz, Michael C and McCombie, W Richar
"The human origin recognition complex is essential for pre-RC assembly, mitosis, and maintenance of nuclear structure"
eLife
, v.10
, 2021
https://doi.org/10.7554/eLife.61797
Citation
Details
Darby, Charlotte A and Gaddipati, Ravi and Schatz, Michael C and Langmead, Ben and Birol, Inanc
"Vargas: heuristic-free alignment for assessing linear and graph read aligners"
Bioinformatics
, 2020
https://doi.org/10.1093/bioinformatics/btaa265
Citation
Details
Fang, H, Narzisi, G, O'Rawe, J, Wu, Y, Rosenbaum, J, Ronemus, M, Iossifov, I, Schatz MC, Lyon, GJ
"Reducing INDEL calling errors in whole-genome and exome sequencing data"
Genome Medicine
, v.6
, 2014
doi:10.1186/s13073-014-0089-z
Fouks, Bertrand and Brand, Philipp and Nguyen, Hung N. and Herman, Jacob and Camara, Francisco and Ence, Daniel and Hagen, Darren E. and Hoff, Katharina J. and Nachweide, Stefanie and Romoth, Lars and Walden, Kimberly K.O. and Guigo, Roderic and Stanke, M
"The genomic basis of evolutionary differentiation among honey bees"
Genome Research
, v.31
, 2021
https://doi.org/10.1101/gr.272310.120
Citation
Details
Hoyt, Savannah J. and Storer, Jessica M. and Hartley, Gabrielle A. and Grady, Patrick G. and Gershman, Ariel and de Lima, Leonardo G. and Limouse, Charles and Halabian, Reza and Wojenski, Luke and Rodriguez, Matias and Altemose, Nicolas and Rhie, Arang an
"From telomere to telomere: The transcriptional and epigenetic state of human repeat elements"
Science
, v.376
, 2022
https://doi.org/10.1126/science.abk3112
Citation
Details
Jarvis, Erich D. and Formenti, Giulio and Rhie, Arang and Guarracino, Andrea and Yang, Chentao and Wood, Jonathan and Tracey, Alan and Thibaud-Nissen, Francoise and Vollger, Mitchell R. and Porubsky, David and Cheng, Haoyu and Asri, Mobin and Logsdon, Gle
"Semi-automated assembly of high-quality diploid human reference genomes"
Nature
, v.611
, 2022
https://doi.org/10.1038/s41586-022-05325-5
Citation
Details
Kirsche, Melanie and Das, Arun and Schatz, Michael C
"Sapling: accelerating suffix array queries with learned data models"
Bioinformatics
, v.37
, 2020
https://doi.org/10.1093/bioinformatics/btaa911
Citation
Details
Kirsche, Melanie and Prabhu, Gautam and Sherman, Rachel and Ni, Bohan and Battle, Alexis and Aganezov, Sergey and Schatz, Michael C.
"Jasmine and Iris: population-scale structural variant comparison and analysis"
Nature Methods
, 2023
https://doi.org/10.1038/s41592-022-01753-3
Citation
Details
Kovaka, Sam and Fan, Yunfan and Ni, Bohan and Timp, Winston and Schatz, Michael C.
"Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED"
Nature Biotechnology
, v.39
, 2021
https://doi.org/10.1038/s41587-020-0731-9
Citation
Details
Marcus, S, Lee, H, Schatz MC
"SplitMEM: A graphical algorithm for pan-genome analysis with suffix skips"
Bioinformatics
, v.30
, 2014
, p.3476
10.1093/bioinformatics/btu756
Mc Cartney, Ann M. and Shafin, Kishwar and Alonge, Michael and Bzikadze, Andrey V. and Formenti, Giulio and Fungtammasan, Arkarachai and Howe, Kerstin and Jain, Chirag and Koren, Sergey and Logsdon, Glennis A. and Miga, Karen H. and Mikheenko, Alla and Pa
"Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies"
Nature Methods
, v.19
, 2022
https://doi.org/10.1038/s41592-022-01440-3
Citation
Details
Naish, Matthew and Alonge, Michael and Wlodzimierz, Piotr and Tock, Andrew J. and Abramson, Bradley W. and Schmücker, Anna and Mandáková, Terezie and Jamge, Bhagyshree and Lambing, Christophe and Kuo, Pallas and Yelina, Natasha and Hartwick, Nolan and Col
"The genetic and epigenetic landscape of the Arabidopsis centromeres"
Science
, v.374
, 2021
https://doi.org/10.1126/science.abi7489
Citation
Details
Narzisi, G, O'Rawe, JA, Iossifov, I, Fang, H, Lee, YH, Wang, Z, Wu, Y, Lyon, G, Wigler, M, Schatz MC
"Accurate de novo and transmitted indel detection in exome-capture data using microassembly."
Nature Methods
, v.11
, 2014
doi:10.1038/nmeth.3069
Narzisi, G, Schatz, MC
"The challenge of small-scale repeats for indel discovery"
Frontiers in Bioengineering and Biotechnology
, 2014
doi: 10.3389/fbioe.2015.00008
Nattestad, Maria and Aboukhalil, Robert and Chin, Chen-Shan and Schatz, Michael C
"Ribbon: intuitive visualization for complex genomic variation"
Bioinformatics
, v.37
, 2020
https://doi.org/10.1093/bioinformatics/btaa680
Citation
Details
Nurk, Sergey and Koren, Sergey and Rhie, Arang and Rautiainen, Mikko and Bzikadze, Andrey V. and Mikheenko, Alla and Vollger, Mitchell R. and Altemose, Nicolas and Uralsky, Lev and Gershman, Ariel and Aganezov, Sergey and Hoyt, Savannah J. and Diekhans, M
"The complete sequence of a human genome"
Science
, v.376
, 2022
https://doi.org/10.1126/science.abj6987
Citation
Details
Palatnick, Aspyn and Zhou, Bin and Ghedin, Elodie and Schatz, Michael_C
"iGenomics: Comprehensive DNA sequence analysis on your Smartphone"
GigaScience
, v.9
, 2020
https://doi.org/10.1093/gigascience/giaa138
Citation
Details
Ranallo-Benavidez, T. Rhyker and Jaron, Kamil S. and Schatz, Michael C.
"GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes"
Nature Communications
, v.11
, 2020
https://doi.org/10.1038/s41467-020-14998-3
Citation
Details
Ranallo-Benavidez, T. Rhyker and Lemmon, Zachary and Soyk, Sebastian and Aganezov, Sergey and Salerno, William J. and McCoy, Rajiv C. and Lippman, Zachary B. and Schatz, Michael C. and Sedlazeck, Fritz J.
"Optimized sample selection for cost-efficient long-read population sequencing"
Genome Research
, v.31
, 2021
https://doi.org/10.1101/gr.264879.120
Citation
Details
Schatz MC, Maron, LG, Stein, JC, Wences, AH, Gurtowski, J, Biggers, E, Lee, H, Kramer, M, Antoniou, E, Ghiban, E, Wright, MH, Chia, JM, Ware, D, McCouch, S, McCombie, WR
"Whole genome de novo assemblies of three divergent strains of rice (O. sativa) documents novel gene space of aus and indica"
Genome Biology
, v.15
, 2014
doi:10.1186/s13059-014-0506-z
Thielen, Peter M. and Wohl, Shirlee and Mehoke, Thomas and Ramakrishnan, Srividya and Kirsche, Melanie and Falade-Nwulia, Oluwaseun and Trovão, Nídia S. and Ernlund, Amanda and Howser, Craig and Sadowski, Norah and Morris, C. Paul and Hopkins, Mark and Sc
"Genomic diversity of SARS-CoV-2 during early introduction into the BaltimoreWashington metropolitan area"
JCI Insight
, v.6
, 2021
https://doi.org/10.1172/jci.insight.144350
Citation
Details
Wenger, Aaron M. and Peluso, Paul and Rowell, William J. and Chang, Pi-Chuan and Hall, Richard J. and Concepcion, Gregory T. and Ebler, Jana and Fungtammasan, Arkarachai and Kolesnikov, Alexey and Olson, Nathan D. and Töpfer, Armin and Alonge, Michael and
"Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome"
Nature Biotechnology
, v.37
, 2019
https://doi.org/10.1038/s41587-019-0217-9
Citation
Details
(Showing: 1 - 10 of 32)
(Showing: 1 - 32 of 32)
Please report errors in award information by writing to: awardsearch@nsf.gov.