NSF Award Search: Award # 1452327

Award Abstract # 1452327

EAGER: Locality-Aware Data Access Control for Future 1000-core Processors

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	UNIVERSITY OF CONNECTICUT
Initial Amendment Date:	August 6, 2014
Latest Amendment Date:	August 24, 2016
Award Number:	1452327
Award Instrument:	Standard Grant
Program Manager:	Tao Li CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	August 1, 2014
End Date:	July 31, 2017 (Estimated)
Total Intended Award Amount:	$100,000.00
Total Awarded Amount to Date:	$127,999.00
Funds Obligated to Date:	FY 2014 = $100,000.00 FY 2015 = $8,000.00 FY 2016 = $19,999.00
History of Investigator:	Omer Khan (Principal Investigator) khan@uconn.edu
Recipient Sponsored Research Office:	University of Connecticut 438 WHITNEY RD EXTENSION UNIT 1133 STORRS CT US 06269-9018 (860)486-3622
Sponsor Congressional District:	02
Primary Place of Performance:	University of Connecticut 371 Fairfield Way Storrs CT US 06269-4157
Primary Place of Performance Congressional District:	02
Unique Entity Identifier (UEI):	WNTPS995QBM7
Parent UEI:
NSF Program(s):	Algorithmic Foundations, Software & Hardware Foundation
Primary Program Source:	01001415DB NSF RESEARCH & RELATED ACTIVIT 01001516DB NSF RESEARCH & RELATED ACTIVIT 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7916, 7941, 9251
Program Element Code(s):	779600, 779800
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Computer architectures will soon approach an era of single-chip multicore processors with hundreds or even thousands of heterogeneous cores connected via complex interconnection networks and cache hierarchies. These many-core processors will concurrently execute next generation applications, such as big data analytics, to exploit parallelism and specialization for power-performance efficiency. Furthermore, new memory technologies will be integrated to minimize energy-inefficient off-chip accesses. However, the technology trends indicate that wire scaling will slow down dramatically as compared to computation. The cost of moving data efficiently through the future many-core processors will become a major challenge. The increasing core counts with heterogeneous computation and communication capabilities, as well as applications that process massive data with varying degrees of locality and reuse, will introduce data access variations at different layers of the processor.

This project proposes to dynamically exploit and co-optimize this variability in locality and reuse of data as it flows through the processor resources. The strategy is to adopt a hardware-software co-design approach, and develop fine-through-coarse-grain cross-layer mechanisms for locality-optimal data access in future many-core processors. This will be achieved using a novel locality-aware data access control utility (LDAC) that intelligently and cooperatively orchestrates data movement in the shared heterogeneous processor resources to deliver the efficiency promise. If successful, this project will be a major step forward towards a new computational model where runtime management of processor efficiency can be utilized to make tradeoffs with security, privacy, resilience, or accuracy of computation. The PI will build a holistic prototype simulation environment to demonstrate the efficacy of the proposed locality-optimal data access utility. The development of simulator infrastructure and a many-core prototype will allow the products of this research to be disseminated widely. This project will introduce practical multicore computing to graduate and undergraduate students with a focus on writing parallel software for performance and energy efficiency. The research outcomes will enable the design of future many-core processors that use low energy to execute parallel applications efficiently.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Farrukh Hijaz, Qingchuan Shi, George Kurian, Srini Devadas, Omer Khan "Locality-aware data replication in the last-level cache for large scale multicores" Journal of Supercomputing , v.72 , 2016

Farrukh Hijaz, Qingchuan Shi, George Kurian, Srini Devadas, Omer Khan "Locality-aware data replication in the last-level cache for large scale multicores" The Journal of Supercomputing , v.72 , 2016 https://doi.org/10.1007/s11227-015-1608-4

George Kurian, Qingchuan Shi, Srini Devadas, Omer Khan "OSPREY: Implementation of Memory Consistency Models for Cache Coherence Protocols involving Invalidation-Free Data Access" IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques , 2015 https://doi.org/10.1109/PACT.2015.45

Masab Ahmad, Farrukh Hijaz, Qingchuan Shi, Omer Khan "CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores" IEEE International Symposium on Workload Characterization , 2015 https://doi.org/10.1109/IISWC.2015.11

Qingchuan Shi, George Kurian, Farrukh Hijaz, Srini Devadas, Omer Khan "LDAC: Locality-aware Data Access Control for Large-scale Multicore Cache Hierarchies" ACM Transactions on Architecture and Code Optimization , v.13 , 2016 https://doi.org/10.1145/2983632

Qingchuan Shi, Kartik Lakshminarasimhan, Chistopher Noll, Eelco Scholte, Omer Khan "A Lightweight Spatio-temporally Partitioned Multicore Architecture for Concurrent Execution of Safety Critical Workloads" SAE 2016 Aerospace Systems and Technology Conference , 2016 https://doi.org/10.4271/2016-01-2067

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This research developed a lightweight locality-aware data access control (LDAC) utility to mitigate the costly data movement related latency and energy overheads in the cache hierarchy of large-scale multicores. LDAC builds on tiled multicore architectures that deploy distributed directory-based cache coherence protocols to exploit private caching for on-chip data access. However, to mitigate the inefficient handling of data with low spatio-temporal locality, LDAC enables an auxiliary remote-access protocol. By combining remote-access and private caching protocols, LDAC improves cache utilization at various levels of the cache hierarchy, and avoids unnecessary cache line ping-pong and invalidation traffic. We have developed models within the open source Graphite multicore simulator to evaluate the performance and energy impact of LDAC on futuristic large scale multicores. We have built hardware mechanisms that profile and capture spatio-temporal locality/reuse information at runtime to exploit LDAC’s fine-grain data placement and replication capabilities. To date, we have published selective bypassing of the private cache hierarchy at IEEE/ACM International Conference on Computer Architecture (ISCA 2013), selective replication of cache lines in the last-level cache at IEEE International Symposium On High Performance Computer Architecture (HPCA 2014), and the Journal of Supercomputing (SUPE 2016), a novel timestamp-based scheme to detect memory consistency violations in LDAC architecture at IEEE/ACM International Conference on Parallel Architectures and Compilation Tech- niques (PACT 2015), and a combined LDAC architecture to selectively bypass private cache and/or selectively replicate cache lines in the last-level cache at ACM Transactions on Architecture and Code Optimizations (TACO 2017). We have successfully developed a holistic hardware-level LDAC implementation that improves data access latency and energy consumption in large-scale multicores.

All tools including the modified Graphite simuilator, and the CRONO graph benchmark suite are released publicly to allow researchers to explore variant architectures and applications. The CRONO benchmarks was published in the IEEE International Symposium on Workload Characterization (IISWC 2015). The simulator was also utilized to develop a laboratory kit and modernize the computer architecture and multicore computing curriculum at UConn. We have supported the NSF sponsored REU programs at UConn, and involved undergraduates, specially women and underrepresented minorities, in our research program. Siena Biales completed her REU rotation in our research group during Summer 2016. Finally, the outcomes of this project have been regularly shared with industry partners through the Semiconductor Research Corporation (SRC) liaisons from NXP, Intel, ARM, and IBM.

Last Modified: 12/06/2017
Modified by: Omer Khan

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error