Award Abstract # 1629888
II-NEW: GEARS - An Infrastructure for Energy-Efficient Big Data Research on Heterogeneous and Dynamic Data

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: ARIZONA STATE UNIVERSITY
Initial Amendment Date: August 9, 2016
Latest Amendment Date: August 9, 2016
Award Number: 1629888
Award Instrument: Standard Grant
Program Manager: Wendy Nilsen
wnilsen@nsf.gov
 (703)292-2568
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2016
End Date: August 31, 2021 (Estimated)
Total Intended Award Amount: $750,000.00
Total Awarded Amount to Date: $750,000.00
Funds Obligated to Date: FY 2016 = $750,000.00
History of Investigator:
  • Ming Zhao (Principal Investigator)
    mingzhao@asu.edu
  • Kasim Candan (Co-Principal Investigator)
  • Huan Liu (Co-Principal Investigator)
  • Hasan Davulcu (Co-Principal Investigator)
  • Fengbo Ren (Co-Principal Investigator)
Recipient Sponsored Research Office: Arizona State University
660 S MILL AVENUE STE 204
TEMPE
AZ  US  85281-3670
(480)965-5479
Sponsor Congressional District: 04
Primary Place of Performance: Arizona State University
P.O. Box 876011
Tempe
AZ  US  85287-6011
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): NTLHJXM55KZ6
Parent UEI:
NSF Program(s): CCRI-CISE Cmnty Rsrch Infrstrc
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7359
Program Element Code(s): 735900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Big data technologies have been successfully applied to many disciplines for knowledge discovery and decision making, but the further growth and adoption of the big data paradigm face several critical challenges. First, it is challenging to meet the performance needs of modern big data problems which are inherently more difficult, e.g., learning of heterogeneous and imprecise data, and have more stringent performance requirements, e.g., real-time analysis of dynamic data. Second, power consumption is becoming a serious limiting factor to the further scaling of big data systems and the applications that it can support. These challenges demand a new type of big data systems that incorporate unconventional hardware capable of accelerating data processing and accesses while lowering the system's power consumption. Therefore, this project is developing the needed computational infrastructure to support GEARS (an enerGy-Efficient big-datA Research System) for studying heterogeneous and dynamic data using heterogeneous computing and storage resources. GEARS is a one-of-kind, energy-efficient big-data research infrastructure based on cohesively co-designed software and hardware components. It enables a variety of important studies on heterogeneous and dynamic data and advances the scientific knowledge in computer science as well as other data-driven disciplines. It enhances the training of a large body of undergraduate and graduate students, including many from underrepresented groups, by supporting unique research and education activities. Finally, it also benefits the society by contributing new open-source solutions and with potential commercial applications in support of heterogeneous and dynamic data analysis.

The hardware of GEARS includes a cluster of data nodes equipped with heterogeneous processors and storage devices and fine-grained power management capability. The software is developed upon widely-used big data frameworks to support unified programming across CPUs, GPUs, and FPGAs and transparent data access across a deep storage hierarchy integrating DRAM, NVM, SSD, and HDD. GEARS also enables novel systems and algorithms research on learning heterogeneous and dynamic data, including (1) new algorithm partitioning and scheduling schemes for using heterogeneous accelerators and optimizing the performance and energy efficiency of big data tasks; (2) new I/O scheduling and data staging strategies for performance and energy efficiency of the deep big-data storage hierarchy; (3) multi-phase, out-of-core decomposition techniques for large-scale tensors; (4) real-time visual analytics system that links streaming media with simulations for anticipatory analytics; (5) multi-modal deep learning methods with heterogeneous social data; (6) new computational tools for real-time analysis of social unrest using social media; (7) scalable, adaptive, and interactive team detection and assemble system for designing high-performing teams using big network data; (8) rare category analysis and heterogeneous learning algorithms for fast and accurate rare event discoveries with large and heterogeneous social data; and (9) new distributed machine learning framework for learning semantic knowledge from Web-scale images/videos with incomplete/noisy textual annotations. All project results will be shared with the broader community via the project website (http://gears.asu.edu). Publications will be listed on the website with links to their publishers. Data and software downloads will listed on the website with instructions on how to use them. Source code will be hosted on GitHub and a direct link to the repository will also be listed on the project website.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 48)
B. Mathis, Y. Ma, M. Mancenido, R. Maciejewski "Exploring the Design Space of Sankey Diagrams for the Food-Energy-Water Nexus" IEEE Computer Graphics and Applications , 2019
D. Otstott, S. Williams, L. Ionkov, M. Lang, and M. Zhao "A Foundation for Automated Placement of Data" 4th International Parallel Data Systems Workshop (PDSW) , 2019
E. Kuznetsov, Y. Chen, M. Zhao "SecureFL: Privacy Preserving Federated Learning with SGX and TrustZone" Sixth ACM/IEEE Symposium on Edge Computing (SEC) , 2021
H. Behrens, K.S. Candan, X. Chen, Y. Garg, M-L. Li, X. Li, and S. Liu "DataStorm: Coupled, Continuous Simulations for Complex Urban Environments" ACM Transactions on Data Science , 2019
H. Wang, Y. Lu, S.T. Shutters, M. Steptoe, F. Wang, S. Landis, R. Maciejewski "A Visual Analytics Framework for Spatiotemporal Trade Network Analysis" IEEE transactions on Visualization and Computer Graphics , 2019
J. Fu, Y. Lu, J. Shu, G. Liu, and M. Zhao "CowCache: Effective Flash Caching for Copy-on-Write Virtual Disks" Cluster Computing , 2019
J. Fu, Y. Lu, J. Shu, G. Liu, and M. Zhao "CowCache: Effective Flash Caching for Copy-on-Write Virtual Disks" Cluster Computing , 2019
Jianboi Li, Jingrui He, and Yada Zhu "HiMuV: Hierarchical Framework for Modeling Multi-modality Multi-resolution Data" International Conference on Data Mining (ICDM) , 2017
Jundong Li, Liang Wu, Ruocheng Guo, Chenghao Liu, Huan Liu "Multi-Level network Embedding with Boosted Low-Rank Matrix Approximation" Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , 2019
Jun Wu, Jingrui He, Yongming Liu "ImVerde: Vertex-Diminished Random Walk for Learning Imbalanced Network Representation" International Conference on Big Data , 2018
J. Zou, M. Zhao, J. Shi, and C. Wang "WATSON: A Workflow-based Data Storage Optimizer for Analytics" 36th International Conference on Massive Storage Systems and Technology (MSST) , 2020
(Showing: 1 - 10 of 48)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Big data technologies have been successfully applied to many disciplines for knowledge discovery and decision making, but the further growth and adoption of the big data paradigm face several critical challenges. First, it is challenging to meet the performance needs of modern big data problems which are inherently more difficult, e.g., learning of heterogeneous and imprecise data, and have more stringent performance requirements, e.g., real-time analysis of dynamic data. Second, power consumption is becoming a serious limiting factor to the further scaling of big data systems and the applications that it can support. These challenges demand a new type of big data systems that incorporate unconventional hardware capable of accelerating data processing and accesses while lowering the system's power consumption. This project has addressed this need by developing GEARS (an enerGy-Efficient big-datA Research System), a unique infrastructure for studying heterogeneous and dynamic data using heterogeneous computing and storage resources. GEARS is a one-of-kind, energy-efficient big-data research infrastructure based on cohesively co-designed software and hardware components. It has enabled a variety of important studies on heterogeneous and dynamic data and advances the scientific knowledge in computer science as well as other data-driven disciplines. It has enhanced the training of a large body of undergraduate and graduate students, including many from underrepresented groups, by offering unique research and education activities. Finally, it has also benefitted the society by contributing new open-source solutions and with potential commercial applications in support of heterogeneous and dynamic data analysis.

Specifically, GEARS has contributed the designs and implementations of hardware infrastructure that makes extensive use heterogeneous processors and storage devices and software infrastructure for heterogenous computing across CPUs, GPUs, and FPGAs and data access across a deep storage hierarchy integrating DRAM, NVM, SSD, and HDD. GEARS has enabled novel computer systems and algorithms research on learning heterogeneous and dynamic data, including (1) new algorithm partitioning and scheduling schemes for using heterogeneous accelerators and optimizing the performance and energy efficiency of big data tasks; (2) new I/O scheduling and data staging strategies for performance and energy efficiency of the deep big-data storage hierarchy; (3) multi-phase, out-of-core decomposition techniques for large-scale tensors; (4) real-time visual analytics system that links streaming media with simulations for anticipatory analytics; (5) multi-modal deep learning methods with heterogeneous social data; (6) new computational tools for real-time analysis of social unrest using social media; (7) scalable, adaptive, and interactive team detection and assemble system for designing high-performing teams using big network data; (8) rare category analysis and heterogeneous learning algorithms for fast and accurate rare event discoveries with large and heterogeneous social data; and (9) new distributed machine learning framework for learning semantic knowledge from Web-scale images/videos with incomplete/noisy textual annotations. GEARS has also contributed to important research in many other disciplines, including (1) biological science: analyzing the genomes of wheat leaf rust pathogen for durable rust resistance and sustained rust control; (2) neuroscience: multimodal brain network fusion for learning the latent representations of brain networks; (3) geographical science: automatic terrain feature detection and classification using a mixed set of optimal remote sensing and natural images; (4) medicine: detecting prostate cancer in sequential contrast-enhanced ultrasound (CEUS) images; (5) emergency management: scalable analysis of sparse and noisy disaster data; (6) security: development of techniques and automated tools for identification and analysis of digital disinformation and propaganda; (7) finance: real-time financial fraud detection; (8) sustainability: spatiotemporal visual analytics for exploring the relationship between global trade networks and regional instability; and (9) transportation: monocular 3D object localization in real-time for autonomous driving. All project results have been shared with the broader community via the project website, journal and conference publications, and open-source software.

 


Last Modified: 02/08/2022
Modified by: Ming Zhao

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page