Award Abstract # 1652294
CAREER: In-Situ Compute Memories for Accelerating Data Parallel Applications

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: REGENTS OF THE UNIVERSITY OF MICHIGAN
Initial Amendment Date: January 24, 2017
Latest Amendment Date: May 11, 2021
Award Number: 1652294
Award Instrument: Continuing Grant
Program Manager: Almadena Chtchelkanova
achtchel@nsf.gov
 (703)292-7498
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: February 1, 2017
End Date: January 31, 2024 (Estimated)
Total Intended Award Amount: $573,554.00
Total Awarded Amount to Date: $573,554.00
Funds Obligated to Date: FY 2017 = $154,326.00
FY 2018 = $108,341.00

FY 2019 = $100,463.00

FY 2020 = $103,593.00

FY 2021 = $106,831.00
History of Investigator:
  • Reetuparna Das (Principal Investigator)
    reetudas@umich.edu
Recipient Sponsored Research Office: Regents of the University of Michigan - Ann Arbor
1109 GEDDES AVE STE 3300
ANN ARBOR
MI  US  48109-1015
(734)763-6438
Sponsor Congressional District: 06
Primary Place of Performance: University of Michigan Ann Arbor
3003 S. State Street
Ann Arbor
MI  US  48109-1274
Primary Place of Performance
Congressional District:
06
Unique Entity Identifier (UEI): GNJ7BBP73WE9
Parent UEI:
NSF Program(s): Software & Hardware Foundation
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
01001819DB NSF RESEARCH & RELATED ACTIVIT

01001920DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 7942
Program Element Code(s): 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

As computing today is dominated by Big Data, there is a strong impetus for specialization for this important domain. Performance of these data-centric applications depends critically on efficient access and processing of data. These applications tend to be highly data-parallel and deal with large amounts. Recent studies show that by the year 2020, data production from individuals and corporations is expected to grow to 73.5 zetabytes, a 4.4× increase from the year 2015. In addition, they tend to expend disproportionately large fraction of time and energy in moving data from storage to compute units, and in instruction processing, when compared to the actual computation. This research seeks to design specialized data-centric computing systems that dramatically reduce these overheads.

In a general-purpose computing system, the majority of the aggregate die area (over 90%) is devoted for storing and retrieving information at several levels in the memory hierarchy: on-chip caches, main memory (DRAM), and non-volatile memory (NVM). The central vision of this research is to create in-situ compute memories, which re-purpose the elements used in these storage structures and transform them into active computational units. In contrast to prior processing in memory approaches, which augment logic outside the memory arrays, the underpinning principle behind in-situ compute memories is to enable computation in-place within each memory array, without transferring the data in or out of it. Such a transformation could unlock massive data-parallel compute capabilities (up to 100×), and reduce energy spent in data movement through various levels of memory hierarchy (up to 20×), thereby directly address the needs of data-centric applications. This work develops in-situ compute memory technology, adapts the system software stack and re-designs data-centric applications to take advantage of those memories.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Angstadt, Kevin and Subramaniyan, Arun and Sadredini, Elaheh and Rahimi, Reza and Skadron, Kevin and Weimer, Westley and Das, Reetuparna "ASPEN: A Scalable In-SRAM Architecture for Pushdown Automata" 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2018 10.1109/MICRO.2018.00079 Citation Details
Eckert, Charles and Wang, Xiaowei and Wang, Jingcheng and Subramaniyan, Arun and Iyer, Ravi and Sylvester, Dennis and Blaaauw, David and Das, Reetuparna "Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks" The 45th Annual International Symposium on Computer Architecture , 2018 10.1109/ISCA.2018.00040 Citation Details
Fujiki, Daichi and Mahlke, Scott and Das, Reetuparna "In-Memory Data Parallel Processor" ACM SIGPLAN Notices , v.53 , 2018 10.1145/3296957.3173171 Citation Details
Fujiki, Daichi and Subramaniyan, Arun and Zhang, Tianjun and Zeng, Yu and Das, Reetuparna and Blaauw, David and Narayanasamy, Satish "GenAx: A Genome Sequencing Accelerator" 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) , 2018 10.1109/ISCA.2018.00017 Citation Details
Subramaniyan, Arun and Das, Reetuparna "Parallel Automata Processor" ACM SIGARCH Computer Architecture News , v.45 , 2017 10.1145/3140659.3080207 Citation Details
Subramaniyan, Arun and Gu, Yufeng and Dunn, Timothy and Paul, Somnath and Vasimuddin, Md and Misra, Sanchit and Blaauw, David and Narayanasamy, Satish and Das, Reetuparna "GenomicsBench: A Benchmark Suite for Genomics" 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) , 2021 https://doi.org/10.1109/ISPASS51385.2021.00012 Citation Details
Subramaniyan, Arun and Wang, Jingcheng and Balasubramanian, Ezhil R. and Blaauw, David and Sylvester, Dennis and Das, Reetuparna "Cache automaton" MICRO-50 '17 Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture , 2017 10.1145/3123939.3123986 Citation Details
Wang, Xiaowei and Goyal, Vidushi and Yu, Jiecao and Bertacco, Valeria and Boutros, Andrew and Nurvitadhi, Eriko and Augustine, Charles and Iyer, Ravi and Das, Reetuparna "Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs" 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) , 2021 https://doi.org/10.1109/FCCM51124.2021.00018 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Computer designers have traditionally separated the role of storage and compute units. Memories stored data. Processors’ logic units computed them. Is this separation necessary? A human brain does not separate the two so distinctly. Why should a processor? Over two-thirds of a processor die (caches) and all of main memory is devoted to temporary storage. Today, none of these memory elements can compute. But, could they?  Our research, with this NSF CAREER Award, addresses this fundamental question regarding the role of memory and proposes to impose a dual responsibility on them: store and compute data. 

Data stored in memory arrays share wires (bit-lines) and signal sensing apparatus (sense-amps). We observe that logic operations can be computed over these shared structures which allows us to re-purpose thousands of cache memory arrays into over a million of bit-serial arithmetic-logic units. Thus, we morph existing memory into massive vector compute units, providing parallelism several orders of magnitude higher than a contemporary GPU.  Additionally, it saves energy spent shuffling data between storage and compute units -- a significant concern in BigData applications. In-memory computing is a significant departure from processing-in-memory (PIM) technologies which do not reuse memory structures for computing, but simply move conventional compute units near memory.

Caches that compute can be a game changer for Artificial Intelligence (AI). They can add accelerator capabilities to general-purpose processors, without the significant die area cost of a dedicated accelerator like Google’s TPU. For example, we showed that compute-enabled caches in Intel Xeon can improve processor efficiency by 629 times for convolutional neural networks (CNNs). This result received significant attention from the industry. 

 Emerging non-volatile memories such as Resistive Memory (ReRAM) can also be repurposed into very large vector parallel units. We asked, could in-memory computing units be used as general-purpose data-parallel accelerators, much like GPUs? Our research showed that the answer is a resounding yes. We built novel compiler technology that can transform arbitrary data-parallel computation expressed in Google’s TensorFlow into binaries that can run directly on ReRAM. We showed that in-memory computing can be over 700x more efficient than GPUs.  

 Finite state automata (FSA) are widely used as a computation model for approximate/exact regular expression matching in several application domains such as data analytics, network security, bioinformatics, and computational finance. We proposed a new way to repurpose DRAM memory as a Finite State Automata (FSA) accelerator. It addressed a very hard problem: parallelization of inherently sequential FSA. Our follow-up work extended this solution to turn processor caches in the FSA accelerator. To achieve this, we developed novel SRAM-based interconnects for automata processing. Our solution provides a speedup of 25x compared to state-of-the-art custom accelerators and more than 3000x compared to off-the-shelf stock processors.  

The automata research led us to Genomics, which was a killer application for FSA acceleration. Genomics can transform precision health over the next decade. We can detect cancer several years earlier through simple blood tests, without invasive biopsies. We can identify infectious pathogens and avoid the indiscriminate use of broad-spectrum antibiotics. In the last few years,  partially supported by this award, we have made significant advancements in accelerating whole genome sequencing and ultra-rapid real-time sequencing for cancer diagnostics.  We have demonstrated that hardware-software co-design approaches can provide orders of magnitude acceleration in some of the most time-consuming steps in Genomics.

This project yielded several notable outcomes with broad impacts. Firstly, it supported two students in earning their PhD degrees in Computer Science and Engineering from the University of Michigan. It also partially supported several other students, including undergraduate research efforts. Secondly, the PI collaborated closely with Intel Corp. for technology transfer and also co-founded a precision health startup. Third, the PI created a permanent EECS undergraduate parallel programming class being taught to students at the University of Michigan.  Lastly, the research was widely disseminated through numerous conferences and journal publications, seminars at universities and industrial research labs, and the creation of benchmarks and artifacts to facilitate further research by the broader community.

 


Last Modified: 07/09/2024
Modified by: Reetuparna Das

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page