Award Abstract # 1450377
Collaborative Research: SI2-SSI: Data-Intensive Analysis for High Energy Physics (DIANA/HEP)

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: THE TRUSTEES OF PRINCETON UNIVERSITY
Initial Amendment Date: May 7, 2015
Latest Amendment Date: May 8, 2018
Award Number: 1450377
Award Instrument: Continuing Grant
Program Manager: Bogdan Mihaila
bmihaila@nsf.gov
 (703)292-8235
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: May 1, 2015
End Date: April 30, 2020 (Estimated)
Total Intended Award Amount: $1,145,564.00
Total Awarded Amount to Date: $1,145,564.00
Funds Obligated to Date: FY 2015 = $950,564.00
FY 2016 = $65,000.00

FY 2017 = $65,000.00

FY 2018 = $65,000.00
History of Investigator:
  • G J Peter Elmer (Principal Investigator)
Recipient Sponsored Research Office: Princeton University
1 NASSAU HALL
PRINCETON
NJ  US  08544-2001
(609)258-3090
Sponsor Congressional District: 12
Primary Place of Performance: Princeton University
Jadwin Hall
Princeton
NJ  US  08544-2020
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): NJ1YPQXQG7U5
Parent UEI:
NSF Program(s): OFFICE OF MULTIDISCIPLINARY AC,
COMPUTATIONAL PHYSICS,
Software Institutes
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
01001617DB NSF RESEARCH & RELATED ACTIVIT

01001718DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7433, 8005, 8009, 8084
Program Element Code(s): 125300, 724400, 800400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Advanced software plays a fundamental role for large scientific projects. The primary goal of DIANA/HEP (Data Intensive ANAlysis for High Energy Physics) is developing state-of-the-art tools for experiments that acquire, reduce, and analyze petabytes of data. Improving performance, interoperability, and collaborative tools through modifications and additions to packages broadly used by the community will allow users to more fully exploit the data being acquired at CERN's Large Hadron Collider (LHC) and other facilities. These experiments are addressing questions at the heart of physics: What are the underlying constituents of matter? And how do they interact? With the discovery of the Higgs boson in 2012, the Standard Model of particle physics is complete. It provides an excellent description of known particles and forces. However, the most interesting questions remain open: What is the dark matter which pervades the universe? Does space-time have additional symmetries or extend beyond the 3 spatial dimensions we know? What is the mechanism stabilizing the Higgs boson mass from enormous quantum corrections? The next generation of experiments will collect exabyte-scale data samples to provide answers. Analyzing this data will require new and better tools.

First, the project will provide the CPU and IO performance needed to reduce the iteration time so crucial to explore new ideas. It will develop software to effectively exploit emerging many- and multi-core hardware. It will establish infrastructure for a higher-level of collaborative analysis, building on the successful patterns used for the Higgs boson discovery and enabling a deeper communication between the theoretical community and the experimental community. DIANA?s products will sit in the ROOT framework, already used by the HEP community of more than 10000 particle and nuclear physicists. By improving interoperability with the larger scientific software ecosystem, DIANA will incorporate best practices and algorithms from other disciplines into HEP. Similarly, the project will make its computing insights, tools, and novel ideas related to collaborative analysis, standards for data preservation, and best practices for treating software as a research product available to the larger scientific community. Finally, to improve the quality of the next generation of software engineers in HEP, DIANA will host an annual workshop on analysis tools and establish a fellowship program.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Brian Bockelman, Zhe Zhang, Jim Pivarski "Optimizing ROOT IO For Analysis" Proceedings of the 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT) , 2017 doi:10.1088/1742-6596/1085/3/032012
David Lange "Building a scalable python distribution for HEP data analysis" Proceedings of 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT)J. Phys.: Conf. Ser. , v.1085 , 2017 , p.042041 doi:10.1088/1742-6596/1085/4/042041
Jim Pivarski, David Lange, Thanat Jatuphattharachat "Toward real-time data query systems in HEP" Proceedings of 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT) , 2017 doi:10.1088/1742-6596/1085/3/032044
Jim Pivarski, Jaydeep Nandi, David Lange, and Peter Elmer "Columnar data processing for HEP analysis" Proceedings of the 23rd International Conference on Computing in High Energy and Nuclear Physics (CHEP2018) , 2019 https://doi.org/10.1051/epjconf/201921406026
Jim Pivarski, Peter Elmer, Brian Bockelman, Zhe Zhang "Fast Access to Columnar, Hierarchically Nested Data via Code Transformation" Accepted to IEEE Big Data 2017 , 2017 10.1109/BigData.2017.8257933
Matteo Cremonesi, Claudio Bellini, Bianny Bian, Luca Canali, Vasileios Dimakopoulos, Peter Elmer, Ian Fisk, Maria Girone, Oliver Gutsche, Siew-Yan Hoh, Bo Jayatilaka, Viktor Khristenko, Andrea Luiselli, Andrew Melo, Evangelos Evangelos, Dominick Olivito, "Using Big Data Technologies for HEP Analysis" Proceedings of the 23rd International Conference on Computing in High Energy and Nuclear Physics (CHEP2018) , 2019 https://doi.org/10.1051/epjconf/201921406030
Oliver Gutsche, Luca Canali, Illia Cremer, Matteo Cremonesi, Peter Elmer, Ian Fisk, Maria Girone, Bo Jayatilaka, Jim Kowalkowski, Viktor Khristenko, Evangelos Motesnitsalis, Jim Pivarski, Saba Sehrish, Kacper Surdy, Alexey Svyatkovskiy "CMS Analysis and Data Reduction with Apache Spark" Proceedings of 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2017) , 2017 doi:10.1088/1742-6596/1085/4/042030
Oliver Gutsche, Matteo Cremonesi, Peter Elmer, Bo Jayatilaka, Jim Kowalkowski, Jim Pivarski, Saba Sehrish, Cristina Mantilla Surez, Alexey Svyatkovskiy and Nhan Tran "Big Data in HEP: A comprehensive use case study" Proceedings of 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP2016)10?14 October 2016, San Francisco, USAJ. Phys.: Conf. Ser. 898 072012 , v.898 , 2017 , p.072012 doi:10.1088/1742-6596/898/7/072012
Oliver Gutsche, Matteo Cremonesi, Peter Elmer, Bo Jayatilaka, Jim Kowalkowski, Jim Pivarski, Saba Sehrish, Cristina Mantilla Surez, Alexey Svyatkovskiy, Nhan Tran "Big Data in HEP: A comprehensive use case study" Proceedings for 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2016) , 2017

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Advanced software plays a fundamental role for large scientific projects ranging from the acquisition of data through to the final results of subsequent processing and analysis. It is the glue which enables large-scale collaboration; particularly for teams of researchers working together to exploit accelerators, telescopes and other large scientific instruments.  Building the requisite software is technically challenging because the computing technologies (processors, storage, networks) are evolving and data volumes are increasing rapidly, requiring ever more sophisticated data analysis methods. The Data-Intensive Analysis for High Energy Physics (DIANA/HEP) project brought together a team of particle physicists and computer scientists, in collaboration with an international team of researchers, to advance the state-of-the-art for key data analysis software tools used by the particle and nuclear physics communities. The project focused on building sustainable software for these communities and in particular on improvements in computing performance, interoperability of particle physics domain software with the larger data science ecosystem and development of tools for collaborative analysis.
The DIANA/HEP project has been catalytic to opening up the scientific python ecosystem for the particle physics community. Tools such as Uproot provided key interoperability between the current ROOT-based ecosystem and the scientific Python ecosystem. The development of Awkward Array adds key performance improvements in the Python ecosystem for the non-rectilinear data typical in particle physics. In addition we have continued to make core performance contributions to the unique C++ environment provided by the ROOT software, and developed tools for the python ecosystem (e.g. Histogram and Lorentz vector libraries) needed for particle physics, but of general applicability. A general umbrella package (scikit-hep) for these and other pythonic tools for particle physics was created and has been widely adopted with the community.
An additional key outcome from the DIANA/HEP project has been the demonstration of the viability of a particle physics analysis tools ecosystem which extends smoothly into the Python ecosystem as well as concepts of "columnar data analysis" that will be used both in the Python ecosystem and the ROOT ecosystem. This demonstration was essential to the S2I2-HEP planning process and the concepts that emerged as part of that for "analysis systems" which provide very short "time to insight". These concepts, built on DIANA/HEP research outcomes, are being carried forward in the "analysis systems" focus area within the IRIS-HEP software institute funded by NSF (OAC-1836650).


Last Modified: 01/26/2021
Modified by: G.J. Peter Elmer

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page