Award Abstract # 1464268
CRII: SHF: HPC Solutions to Big NGS Data Compression

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: WESTERN MICHIGAN UNIVERSITY
Initial Amendment Date: January 28, 2015
Latest Amendment Date: January 25, 2016
Award Number: 1464268
Award Instrument: Standard Grant
Program Manager: Almadena Chtchelkanova
achtchel@nsf.gov
 (703)292-7498
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: February 1, 2015
End Date: November 30, 2018 (Estimated)
Total Intended Award Amount: $171,341.00
Total Awarded Amount to Date: $187,341.00
Funds Obligated to Date: FY 2015 = $171,341.00
FY 2016 = $8,292.00
History of Investigator:
  • Fahad Saeed (Principal Investigator)
    FSAEED@FIU.EDU
Recipient Sponsored Research Office: Western Michigan University
1903 W MICHIGAN AVE
KALAMAZOO
MI  US  49008-5200
(269)387-8298
Sponsor Congressional District: 04
Primary Place of Performance: Western Michigan University
4601 Campus Drive
Kalamazoo
MI  US  49008-5314
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): J7WULLYGFRH1
Parent UEI:
NSF Program(s): Software & Hardware Foundation
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7942, 8228, 9251
Program Element Code(s): 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Sequencing of genomes for numerous species including humans has become increasingly affordable due to next generation high-throughput genome sequencing (NGS) technologies. This opens up perspectives for diagnosis and treatment of genetic diseases and is increasingly effective in conducting system biology studies. However, there remain many computational challenges that need to be addressed before these technologies find their way into every day health and human care. One such daunting challenge is the volume of sequencing data which can reach peta-byte level for comprehensive system-biology studies.
Genomic data compression is needed to reduce the storage size, to increase the speed and reduce the cost of I/O bandwidth required for transmission of such data. However, existing genomic compression solutions yield poor performance for Big Genomic Data. Further, the existing state of the art tools require the user to decompress the data before it can be used for further analysis. This project is focused on compression of genomic information and developing a framework which will allow analysis of compressed form of the data. The project develops HPC solutions for fast compression of Big NGS Data sets using ubiquitous architectures such as GPUs and multicore processors. HPC techniques are utilized to compute essential functions such as alignment and mapping using the compressed form of the NGS data. More efficient encoding of the NGS data for better network utilization is also being investigated.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Muaaz Gul Awan and Fahad Saeed "MS-REDUCE: An ultrafast technique for reduction of Big Mass Spectrometry Data for high-throughput processing" Oxford Bioinformatics , v.32 , 2016 , p.1518 https://doi.org/10.1093/bioinformatics/btw023
Saeed, Fahad and Haspel, Nurit and Al-Mubaid, Hisham "Introduction to the selected papers from the 7th International Conference on Bioinformatics and Computational Biology (BICoB 2015)" Journal of Bioinformatics and Computational Biology , v.14 , 2016 , p.1602002 10.1142/S0219720016020029
Sandoval, Pablo C and J?Neka, S Claxton and Lee, Jae Wook and Saeed, Fahad and Hoffert, Jason D and Knepper, Mark A "Systems-level analysis reveals selective regulation of Aqp2 gene expression by vasopressin" Scientific Reports , v.6 , 2016 , p.34863 10.1038/srep34863
Sookkasem Khositseth, Panapat Uawithya, Poorichaya Somparn, Komgrid Charngkaew, Nattakan Thippamom, Jason D. Hoffert, Fahad Saeed, D. Michael Payne, Shu Hui Chen, Robert A. Fenton and Trairak Pisitkun "Autophagic degradation of aquaporin-2 is an early event in hypokalemia-induced nephrogenic diabetes insipidus" Nature Scientific Reports , v.5 , 2015 10.1038/srep18311
Vargas-Perez, Sandino and Saeed, Fahad "A Hybrid MPI-OpenMP Strategy to Speedup the Compression of Big Next-Generation Sequencing Datasets" IEEE Transactions on Parallel and Distributed Systems , v.28 , 2017 10.1109/TPDS.2017.2692782 Citation Details
Vargas-Perez, Sandino and Saeed, Fahad "A Hybrid MPI-OpenMP Strategy to Speedup the Compression of Big Next-Generation Sequencing Datasets" IEEE Transactions on Parallel and Distributed Systems , 2017 10.1109/TPDS.2017.2692782

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page