
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | January 24, 2017 |
Latest Amendment Date: | May 11, 2021 |
Award Number: | 1652294 |
Award Instrument: | Continuing Grant |
Program Manager: |
Almadena Chtchelkanova
achtchel@nsf.gov (703)292-7498 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | February 1, 2017 |
End Date: | January 31, 2024 (Estimated) |
Total Intended Award Amount: | $573,554.00 |
Total Awarded Amount to Date: | $573,554.00 |
Funds Obligated to Date: |
FY 2018 = $108,341.00 FY 2019 = $100,463.00 FY 2020 = $103,593.00 FY 2021 = $106,831.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
1109 GEDDES AVE STE 3300 ANN ARBOR MI US 48109-1015 (734)763-6438 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
3003 S. State Street Ann Arbor MI US 48109-1274 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Software & Hardware Foundation |
Primary Program Source: |
01001819DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT 01002021DB NSF RESEARCH & RELATED ACTIVIT 01002122DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
As computing today is dominated by Big Data, there is a strong impetus for specialization for this important domain. Performance of these data-centric applications depends critically on efficient access and processing of data. These applications tend to be highly data-parallel and deal with large amounts. Recent studies show that by the year 2020, data production from individuals and corporations is expected to grow to 73.5 zetabytes, a 4.4× increase from the year 2015. In addition, they tend to expend disproportionately large fraction of time and energy in moving data from storage to compute units, and in instruction processing, when compared to the actual computation. This research seeks to design specialized data-centric computing systems that dramatically reduce these overheads.
In a general-purpose computing system, the majority of the aggregate die area (over 90%) is devoted for storing and retrieving information at several levels in the memory hierarchy: on-chip caches, main memory (DRAM), and non-volatile memory (NVM). The central vision of this research is to create in-situ compute memories, which re-purpose the elements used in these storage structures and transform them into active computational units. In contrast to prior processing in memory approaches, which augment logic outside the memory arrays, the underpinning principle behind in-situ compute memories is to enable computation in-place within each memory array, without transferring the data in or out of it. Such a transformation could unlock massive data-parallel compute capabilities (up to 100×), and reduce energy spent in data movement through various levels of memory hierarchy (up to 20×), thereby directly address the needs of data-centric applications. This work develops in-situ compute memory technology, adapts the system software stack and re-designs data-centric applications to take advantage of those memories.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Computer designers have traditionally separated the role of storage and compute units. Memories stored data. Processors’ logic units computed them. Is this separation necessary? A human brain does not separate the two so distinctly. Why should a processor? Over two-thirds of a processor die (caches) and all of main memory is devoted to temporary storage. Today, none of these memory elements can compute. But, could they? Our research, with this NSF CAREER Award, addresses this fundamental question regarding the role of memory and proposes to impose a dual responsibility on them: store and compute data.
Data stored in memory arrays share wires (bit-lines) and signal sensing apparatus (sense-amps). We observe that logic operations can be computed over these shared structures which allows us to re-purpose thousands of cache memory arrays into over a million of bit-serial arithmetic-logic units. Thus, we morph existing memory into massive vector compute units, providing parallelism several orders of magnitude higher than a contemporary GPU. Additionally, it saves energy spent shuffling data between storage and compute units -- a significant concern in BigData applications. In-memory computing is a significant departure from processing-in-memory (PIM) technologies which do not reuse memory structures for computing, but simply move conventional compute units near memory.
Caches that compute can be a game changer for Artificial Intelligence (AI). They can add accelerator capabilities to general-purpose processors, without the significant die area cost of a dedicated accelerator like Google’s TPU. For example, we showed that compute-enabled caches in Intel Xeon can improve processor efficiency by 629 times for convolutional neural networks (CNNs). This result received significant attention from the industry.
Emerging non-volatile memories such as Resistive Memory (ReRAM) can also be repurposed into very large vector parallel units. We asked, could in-memory computing units be used as general-purpose data-parallel accelerators, much like GPUs? Our research showed that the answer is a resounding yes. We built novel compiler technology that can transform arbitrary data-parallel computation expressed in Google’s TensorFlow into binaries that can run directly on ReRAM. We showed that in-memory computing can be over 700x more efficient than GPUs.
Finite state automata (FSA) are widely used as a computation model for approximate/exact regular expression matching in several application domains such as data analytics, network security, bioinformatics, and computational finance. We proposed a new way to repurpose DRAM memory as a Finite State Automata (FSA) accelerator. It addressed a very hard problem: parallelization of inherently sequential FSA. Our follow-up work extended this solution to turn processor caches in the FSA accelerator. To achieve this, we developed novel SRAM-based interconnects for automata processing. Our solution provides a speedup of 25x compared to state-of-the-art custom accelerators and more than 3000x compared to off-the-shelf stock processors.
The automata research led us to Genomics, which was a killer application for FSA acceleration. Genomics can transform precision health over the next decade. We can detect cancer several years earlier through simple blood tests, without invasive biopsies. We can identify infectious pathogens and avoid the indiscriminate use of broad-spectrum antibiotics. In the last few years, partially supported by this award, we have made significant advancements in accelerating whole genome sequencing and ultra-rapid real-time sequencing for cancer diagnostics. We have demonstrated that hardware-software co-design approaches can provide orders of magnitude acceleration in some of the most time-consuming steps in Genomics.
This project yielded several notable outcomes with broad impacts. Firstly, it supported two students in earning their PhD degrees in Computer Science and Engineering from the University of Michigan. It also partially supported several other students, including undergraduate research efforts. Secondly, the PI collaborated closely with Intel Corp. for technology transfer and also co-founded a precision health startup. Third, the PI created a permanent EECS undergraduate parallel programming class being taught to students at the University of Michigan. Lastly, the research was widely disseminated through numerous conferences and journal publications, seminars at universities and industrial research labs, and the creation of benchmarks and artifacts to facilitate further research by the broader community.
Last Modified: 07/09/2024
Modified by: Reetuparna Das
Please report errors in award information by writing to: awardsearch@nsf.gov.