
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | July 31, 2014 |
Latest Amendment Date: | July 31, 2014 |
Award Number: | 1405939 |
Award Instrument: | Standard Grant |
Program Manager: |
Tao Li
CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2014 |
End Date: | July 31, 2017 (Estimated) |
Total Intended Award Amount: | $286,300.00 |
Total Awarded Amount to Date: | $286,300.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
2550 NORTHWESTERN AVE # 1100 WEST LAFAYETTE IN US 47906-1332 (765)494-1055 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
IN US 47907-2017 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | CCRI-CISE Cmnty Rsrch Infrstrc |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Research and education in many-core computing systems are of importance to the NSF CRI program as well as to the research community. This project targets performance, energy consumption, and scalability of many-core systems, which are important for the computer industry. The team is committed to releasing the research artifacts of the project as open-source software to be used by the research community as well. This project will benefit graduate student research and help educational activities in undergraduate and graduate curricula. The project will support outreach activities sponsored by various centers at Purdue University via the involvement of the team in Purdue Computing Research Institute's High Performance Computing workshops, for example.
This infrastructure will support research and education efforts in multiple areas: computer architecture, compilers, high-performance cloud computing, and run-times for managed languages. Computer architects will explore optimizations for performance, programmability and power of many-core architectures, on-chip networks, and disk optimizations. Compiler writers will explore shared memory optimizations and their scalability targeting shared-memory applications for distributed memory machines, and techniques to transform seemingly irregular memory access patterns into regular and parallel computations and memory accesses. Run-time researchers will pursue parallel garbage collection of large garbage-collected heaps and associated scalability issues. High performance computing researchers will explore the performance overhead of virtualization and cloud computing for cluster workloads, along with mechanisms for reducing overhead.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project includes research and education efforts in computer architecture, compilers, run-times for managed languages, and distributed systems. The architecture research explore optimizations for performance, programmability and power of multicores, many-core architectures, and datacenters. The compiler writers explore techniques to transform seemingly irregular memory access patterns into regular and parallel computations and memory accesses. The run-time researchers pursue parallel garbage collection of large garbage-collected heaps and associated scalability issues. Specific efforts and their significant results are:
- unscaling of clock frequency to tackle slowing of Dennard's Scaling; 15% throughput improvement in many-core systems where voltage scaling has stopped (exceeding previously published "dark silicon performance limit"),
- exploiting value locality for soft-error tolerance; 75% soft-error coverage at 10% performance and 25% power overheads whereas redundancy-based schemes incur 80% power overhead,
- a novel 3-D cache architecture to reduce on-chip tag overhead while converting 3-D bandwidth advantage into performance; for under 1-MB of on-chip overhead, our 256-MB 3-D DRAM cache performs 15% better than the best previous design with similar on-chip tag.
- a novel cost-effective distributed system architecture for causal consistency; reduce the cost of a causally-consistent geo-replicated data store by 28-37% via partial replication while achieving the same performance as full replication,
- power and performance optimization of MapReduce via stratified sampling; improve average MapReduce perrformance by 40% while maintaining per-key error within 1%,
- optimizing datacenter power by exploiting latency tail in online data intensive applications; reduce datacenter energy by 15% and 40% at 90% and 30% datacenter loading,
- a novel processing-near-memory (PNM) architectures for Big Data Machine learning; improves performance and energy over GPGPU and a "sea of simple MIMD cores", respectively, by 145% and 20% and 37% and 34% when all the three architectures have the same number of cores, on-die memory, and die-stacked bandwidth.
- addressing message buffer management and flow control for RDMA in datacenters; Our RDMA architecture either reduces buffer memory by three orders of magnitude under little programmer effort or achieves same buffer memory at much less programmer burden,
- implementing nested transactions for Java; our XJ prototype achieves good performance,
- multicore scaling for garbage collection; our implementation achieves scalable perofrmance,
- development of the machine-checked proof for a real-time concurrent collector, allowing parallelized execution of the proof script,
- scalable global routing for HPC; achieves scalable, global, Optimal-bandwidth via application-specific routing,
- optimizing off-chip traffic of convolutional neural networks (CNNs) via a novel tiling strategy; provably-optimal tiling for CNNs using a given on-chip cache capacity (2-10x fewer off-chip misses), and
- new compiler optimizations of irregular applications led to performance improvements of up to 10x on data mining applications, and 70% for tree traversal applications like compiler passes.
This infrastructure has supported the research of more than eight graduate students who are being trained in one or more of computer architecture, compilers, distributed systems, and runtime systems via the above-mentioned efforts. As part of their 'senior design project', an undergraduate team of four developed a DNN (deep neural network) based software infrastructure to automatically track student attendance in classrooms. The analysis of the large dataset for training and evaluation was facilitated by the CRI infrastructure. We expect continued participation of undergraduate students in this activity over the next few semesters.
Last Modified: 11/06/2017
Modified by: T. N Vijaykumar
Please report errors in award information by writing to: awardsearch@nsf.gov.