Award Abstract # 1341711
Wrangler: A Transformational Data Intensive Resource for the Open Science Community

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: UNIVERSITY OF TEXAS AT AUSTIN
Initial Amendment Date: September 26, 2013
Latest Amendment Date: August 9, 2018
Award Number: 1341711
Award Instrument: Cooperative Agreement
Program Manager: Robert Chadduck
rchadduc@nsf.gov
 (703)292-2247
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: November 1, 2013
End Date: October 31, 2020 (Estimated)
Total Intended Award Amount: $6,000,001.00
Total Awarded Amount to Date: $12,392,996.00
Funds Obligated to Date: FY 2013 = $4,981,740.00
FY 2014 = $6,212,553.00

FY 2018 = $1,198,703.00
History of Investigator:
  • Daniel Stanzione (Principal Investigator)
    dan@tacc.utexas.edu
  • Tommy Minyard (Co-Principal Investigator)
  • Christopher Jordan (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Texas at Austin
110 INNER CAMPUS DR
AUSTIN
TX  US  78712-1139
(512)471-6424
Sponsor Congressional District: 25
Primary Place of Performance: University of Texas at Austin
101 E. 27th Street, Suite 5.300
Austin
TX  US  78712-1523
Primary Place of Performance
Congressional District:
25
Unique Entity Identifier (UEI): V6AFQPN18437
Parent UEI:
NSF Program(s): Information Technology Researc,
CYBERINFRASTRUCTURE,
Innovative HPC,
Data Cyberinfrastructure,
Cybersecurity Innovation
Primary Program Source: 01001314DB NSF RESEARCH & RELATED ACTIVIT
01001415DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433, 7434, 7619, 7726
Program Element Code(s): 164000, 723100, 761900, 772600, 802700
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This award is to fund a transformational data intensive resource for the open science community called Wrangler. Big data is creating tremendous new scientific opportunities, but also many new challenges for data-driven science. The computational needs of large-scale data-driven science vary across domains and applications but there are some requirements that are widely applicable: capacious, high performance, reliable data storage; support for diverse data types and access methods; and support for embedded analytics that eliminate costly data movement. Wrangler is a high performance system with an innovative embedded data analytics capability that far exceeds the capabilities available to the open science community today. It contains massive data storage, which can be expanded if required, high performance, and support for both structured and unstructured data. The storage is configured for ultra-high reliability using replication at two locations, unprecedented analytics capabilities and innovative NAND flash storage. The resource contains 3000 next generation Intel Haswell cores, offering the most powerful embedded analytics capabilities in the world for a wide range of data intensive science. Wrangler will be connected at 100 Gbps to Internet2, the fastest available connection to the biggest research network. The project will also offer a data docking service for receiving and ingesting data shipped on physical media. Augmented by Stampede & XSEDE, Wranglers capabilities will be enhanced through tight integration to TACC?s Stampede supercomputer, and through TACC to other XSEDE resources.

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The Wrangler supercomputer was introduced in 2014 as a new class of HPC (high performance computing)  system focused on data-intensive science.   Exponential growth in the generation of digital data is transforming society in myriad ways. Ever more plentiful and powerful devices are creating vast quantities of digital data, enabling analysis and understanding of the behavior of complex systems? not just markets and elections, but also physical, biological, and social systems relevant to science and engineering research and education:

Wrangler was designed to support the new scientific opportunities and challenges . Wrangler pioneered several advances in storage that are now commonplace among both scientific and enterprise computing systems.  Wrangler's filesystems - comprised of innovative arrays of NAND Flash chips coupled directly to compute servers via PCI switches - were the forerunner of today's NVMe (Non-volatile Memory) devices and NVMeOF (NVMe Over Fabric) filesystems, used widely today where I/O challenges are prevalent.  The small company that designed the devices used for Wrangler's high speed storage systems was acquired by storage giant EMC and ultimately merged into DellEMC.   

The system itself was a productive science resource, which showed that re-balancing the ratio of compute capacity to I/O capability in supercomputers can enhance and enable new kinds of science.   The Wrangler system supported more than 275 different data-intensive science projects during it's production lifespan, with use by nearly 5,000 users from dozens of institutions.  Wrangler also made use of data analytics tools not normally available on supercomputers of the time - R, Spark, Scala, etc; these were a precursor to the Jupyter notebooks now so prevalent in scientific computing today.    

Among the applications that used Wrangler to support research: 

  • The MEMEX web analytics project tackled  various topics on Wrangler including exploring the dark web at a scale not previously possible using either local Hadoop or AWS hosted infrastructures; to uncover illicit activity around the world.

  • The Hobby Eberly Telescope Dark Energy Experiment (HETDEX) used Wrangler to store and analyze data from this >$40M project to explore the evolution of the early universe. 

  • A Quantum Chromodynamics model analysis that saw an order of magnitude in per-core performance improvement over traditional supercomputers.

Wrangler also served as a hub for training data scientists in engineers, supporting a 3 month professional training workshop in machine learning for oil and gas engineers, and numerous hackathons around the Zika virus outbreak, smart traffic management, NSF Polar programs research, and many more.   

Many of the concepts of Wrangler are now embedded in Frontera and other large scale supercomputers, enabling data-intensive science as a peer to simulation science in high end computing. 

 


Last Modified: 05/26/2021
Modified by: Daniel Stanzione

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page