
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | August 6, 2014 |
Latest Amendment Date: | August 24, 2016 |
Award Number: | 1452327 |
Award Instrument: | Standard Grant |
Program Manager: |
Tao Li
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2014 |
End Date: | July 31, 2017 (Estimated) |
Total Intended Award Amount: | $100,000.00 |
Total Awarded Amount to Date: | $127,999.00 |
Funds Obligated to Date: |
FY 2015 = $8,000.00 FY 2016 = $19,999.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
438 WHITNEY RD EXTENSION UNIT 1133 STORRS CT US 06269-9018 (860)486-3622 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
371 Fairfield Way Storrs CT US 06269-4157 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Algorithmic Foundations, Software & Hardware Foundation |
Primary Program Source: |
01001516DB NSF RESEARCH & RELATED ACTIVIT 01001617DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Computer architectures will soon approach an era of single-chip multicore processors with hundreds or even thousands of heterogeneous cores connected via complex interconnection networks and cache hierarchies. These many-core processors will concurrently execute next generation applications, such as big data analytics, to exploit parallelism and specialization for power-performance efficiency. Furthermore, new memory technologies will be integrated to minimize energy-inefficient off-chip accesses. However, the technology trends indicate that wire scaling will slow down dramatically as compared to computation. The cost of moving data efficiently through the future many-core processors will become a major challenge. The increasing core counts with heterogeneous computation and communication capabilities, as well as applications that process massive data with varying degrees of locality and reuse, will introduce data access variations at different layers of the processor.
This project proposes to dynamically exploit and co-optimize this variability in locality and reuse of data as it flows through the processor resources. The strategy is to adopt a hardware-software co-design approach, and develop fine-through-coarse-grain cross-layer mechanisms for locality-optimal data access in future many-core processors. This will be achieved using a novel locality-aware data access control utility (LDAC) that intelligently and cooperatively orchestrates data movement in the shared heterogeneous processor resources to deliver the efficiency promise. If successful, this project will be a major step forward towards a new computational model where runtime management of processor efficiency can be utilized to make tradeoffs with security, privacy, resilience, or accuracy of computation. The PI will build a holistic prototype simulation environment to demonstrate the efficacy of the proposed locality-optimal data access utility. The development of simulator infrastructure and a many-core prototype will allow the products of this research to be disseminated widely. This project will introduce practical multicore computing to graduate and undergraduate students with a focus on writing parallel software for performance and energy efficiency. The research outcomes will enable the design of future many-core processors that use low energy to execute parallel applications efficiently.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This research developed a lightweight locality-aware data access control (LDAC) utility to mitigate the costly data movement related latency and energy overheads in the cache hierarchy of large-scale multicores. LDAC builds on tiled multicore architectures that deploy distributed directory-based cache coherence protocols to exploit private caching for on-chip data access. However, to mitigate the inefficient handling of data with low spatio-temporal locality, LDAC enables an auxiliary remote-access protocol. By combining remote-access and private caching protocols, LDAC improves cache utilization at various levels of the cache hierarchy, and avoids unnecessary cache line ping-pong and invalidation traffic. We have developed models within the open source Graphite multicore simulator to evaluate the performance and energy impact of LDAC on futuristic large scale multicores. We have built hardware mechanisms that profile and capture spatio-temporal locality/reuse information at runtime to exploit LDAC’s fine-grain data placement and replication capabilities. To date, we have published selective bypassing of the private cache hierarchy at IEEE/ACM International Conference on Computer Architecture (ISCA 2013), selective replication of cache lines in the last-level cache at IEEE International Symposium On High Performance Computer Architecture (HPCA 2014), and the Journal of Supercomputing (SUPE 2016), a novel timestamp-based scheme to detect memory consistency violations in LDAC architecture at IEEE/ACM International Conference on Parallel Architectures and Compilation Tech- niques (PACT 2015), and a combined LDAC architecture to selectively bypass private cache and/or selectively replicate cache lines in the last-level cache at ACM Transactions on Architecture and Code Optimizations (TACO 2017). We have successfully developed a holistic hardware-level LDAC implementation that improves data access latency and energy consumption in large-scale multicores.
All tools including the modified Graphite simuilator, and the CRONO graph benchmark suite are released publicly to allow researchers to explore variant architectures and applications. The CRONO benchmarks was published in the IEEE International Symposium on Workload Characterization (IISWC 2015). The simulator was also utilized to develop a laboratory kit and modernize the computer architecture and multicore computing curriculum at UConn. We have supported the NSF sponsored REU programs at UConn, and involved undergraduates, specially women and underrepresented minorities, in our research program. Siena Biales completed her REU rotation in our research group during Summer 2016. Finally, the outcomes of this project have been regularly shared with industry partners through the Semiconductor Research Corporation (SRC) liaisons from NXP, Intel, ARM, and IBM.
Last Modified: 12/06/2017
Modified by: Omer Khan
Please report errors in award information by writing to: awardsearch@nsf.gov.