
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | February 23, 2021 |
Latest Amendment Date: | March 12, 2025 |
Award Number: | 2109988 |
Award Instrument: | Standard Grant |
Program Manager: |
Almadena Chtchelkanova
achtchel@nsf.gov (703)292-7498 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | March 1, 2021 |
End Date: | December 31, 2026 (Estimated) |
Total Intended Award Amount: | $187,447.00 |
Total Awarded Amount to Date: | $2,400,492.00 |
Funds Obligated to Date: |
FY 2022 = $933,297.00 FY 2023 = $932,401.00 FY 2025 = $277,350.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
323 DR MARTIN LUTHER KING JR BLVD NEWARK NJ US 07102-1824 (973)596-5275 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
University Heights Newark NJ US 07102-1982 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Software & Hardware Foundation |
Primary Program Source: |
01002223RB NSF RESEARCH & RELATED ACTIVIT 01002324RB NSF RESEARCH & RELATED ACTIVIT 01002526RB NSF RESEARCH & RELATED ACTIVIT 01002122RB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
A real-world challenge in data science is to develop interactive methods for quickly analyzing new and novel data sets that are potentially of massive scale. This award will design and implement fundamental algorithms for high performance computing solutions that enable the interactive large-scale data analysis of massive data sets. Based on the widely-used data types and structures of strings, sets, matrices and graphs, this methodology will produce efficient and scalable software for three classes of fundamental algorithms that will drastically improve the performance on a wide range of real-world queries or directly realize frequent queries. These innovations will allow the broad community to move massive-scale data exploration from time-consuming batch processing to interactive analyses that give a data analyst the ability to comprehensively, deeply and efficiently explore the insights and science in real world data sets. By enabling the increasing number of developers to easily manipulate large data sets, this will greatly enlarge the data science community and find much broader use in new communities. Materials from this project will be included in graduate and undergraduate course curriculum. Especially, women, high school students and other underrepresented groups in STEM areas will be encouraged to participate in this research activity.
This project focuses on these three important data structures for data analytics: 1) suffix array construction, 2) 'treap' construction and 3) distributed memory join algorithms, useful for analyzing large scale strings, implementing random search in large string data sets, and generating new relations, respectively. These fundamental algorithms serve as the cornerstone to support interactive data science at scale. Based on the theoretical achievements and systematic algorithm design, a novel symbiotic optimization methodology that can combine the theoretical analysis, data structure features, and typical data distribution features together as a whole will be developed to significantly improve the practical performance of the proposed algorithms. To evaluate and show the effectiveness of the proposed algorithms, these algorithms will be implemented in and contribute to an open source NumPy-like software framework that aims to provide productive data discovery tools on massive, dozens-of-terabytes data sets by bringing together the productivity of Python with world-class high performance computing.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
Please report errors in award information by writing to: awardsearch@nsf.gov.