Award Abstract # 2019073
CC* Integration-Large: SciStream: Architecture and Toolkit for Data Streaming between Federated Science Instruments

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: UNIVERSITY OF CHICAGO
Initial Amendment Date: June 30, 2020
Latest Amendment Date: June 30, 2020
Award Number: 2019073
Award Instrument: Standard Grant
Program Manager: Deepankar Medhi
dmedhi@nsf.gov
 (703)292-2935
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2020
End Date: September 30, 2024 (Estimated)
Total Intended Award Amount: $850,000.00
Total Awarded Amount to Date: $850,000.00
Funds Obligated to Date: FY 2020 = $850,000.00
History of Investigator:
  • Rajkumar Kettimuthu (Principal Investigator)
    kettimut@mcs.anl.gov
Recipient Sponsored Research Office: University of Chicago
5801 S ELLIS AVE
CHICAGO
IL  US  60637-5418
(773)702-8669
Sponsor Congressional District: 01
Primary Place of Performance: University of Chicago
Chicago
IL  US  60637-5418
Primary Place of Performance
Congressional District:
01
Unique Entity Identifier (UEI): ZUE9HKT2CLC9
Parent UEI: ZUE9HKT2CLC9
NSF Program(s): CISE Research Resources
Primary Program Source: 01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 289000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Scientific instruments are capable of generating data at very high speeds. However, with traditional file-based data movement and analysis methods, data are often processed at a much lower speed, leading to either operating the instruments at a lower speed or discarding a (significant) portion of the data without processing it. To address this issue, SciStream project will develop software tools to stream data at very high speeds from scientific instruments to supercomputers at a distant location. SciStream hides the complexities in network connections from the end user and provides a high level of security for all the network connections.

The data producers (e.g., data acquisition applications on scientific instruments, simulations on supercomputers) and consumers (e.g., data analysis applications on high performance computing systems) may be in different security domains (and thus require bridging of those domains) and may, further, lack external network connectivity (and thus, require traffic forwarding proxies). SciStream establishes necessary bridging and end-to-end authentication between source and destination, while providing efficient memory-to-memory data streaming. Through the exploration of architectural and design choices and addressing issues of control protocols and security, SciStream will advance the understanding of the challenges in supporting high speed memory-to-memory data streaming between remote instruments in federated science environments.

SciStream will benefit all scientific applications that require memory-to-memory data streaming between distributed instruments. Recent trends suggest that this is an important and growing requirement for many scientific applications. SciStream will help significantly reduce the time to solution for these applications, resulting in improved scientific productivity and thus far-reaching benefits for society. Key design choices such as application-agnostic streaming and support for best-effort streaming will make SciStream appealing to a broader science community. SciStream will engage with domain scientists, campus computing centers, and a scientific user facility to reach a wider audience. Through on-campus programs at the University of Chicago, SciStream will train under-represented students in networking. Additional details on SciStream can be found here: https://scistream.github.io/

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Chung, Joaquin and Zacherek, Wojciech and Wisniewski, AJ and Liu, Zhengchun and Bicer, Tekin and Kettimuthu, Rajkumar and Foster, Ian "SciStream: Architecture and Toolkit for Data Streaming between Federated Science Instruments" 31st International Symposium on High-Performance Parallel and Distributed Computing (HPDC '22) , 2022 https://doi.org/10.1145/3502181.3531475 Citation Details
C. Qu, J. Chung "Evaluating SciStream (Federated Scientific Data Streaming Architecture) on FABRIC" IEEE/ACM SC22 Workshop on Innovating the Network for Data Intensive Science (INDIS) , 2022 Citation Details
Jamil, Hasibul and Chung, Joaquin and Bicer, Tekin and Kosar, Tevfik and Kettimuthu, Rajkumar "Throughput Optimization with a NUMA-Aware Runtime System for Efficient Scientific Data Streaming" , 2023 https://doi.org/10.1145/3624062.3624593 Citation Details
Sankaran, Ganesh C and Chung, Joaquin and Kettimuthu, Raj "Leveraging In-Network Computing and Programmable Switches for Streaming Analysis of Scientific Data" , 2021 https://doi.org/10.1109/NetSoft51509.2021.9492726 Citation Details
Sankaran, Ganesh C and Chung, Joaquin and Kettimuthu, Rajkumar "App2Net: Moving Application Functions to Network & a Case Study on Low-latency Feedback" , 2022 https://doi.org/10.1109/INDIS56561.2022.00006 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Overview: The SciStream project has significantly enhanced secure, high-performance data streaming between scientific instruments across independent administrative domains. By developing an open-source toolkit, SciStream enables efficient memory-to-memory data transfer, seamlessly integrating with widely used authentication methods to ensure secure, scalable, and high-speed connectivity.

Intellectual Merit: SciStream introduced an innovative architecture for real-time data streaming, designed to overcome challenges in network security, performance optimization, and inter-facility data transfer. Key technical advancements include:

Gateway Node (GN) Architecture: Optimized for streaming data between instruments, WANs, and computing resources.

Advanced Networking: Leveraging TCP, QUIC, and high-speed eBPF-based proxies to stream data at high-speed and low-latency between federated scientific instruments.

Integration with Science DMZs: Ensuring secure, policy-compliant streaming in high-performance computing (HPC) environments.

SciStream’s capabilities were rigorously tested on FABRIC and ESnet 100G testbeds, showing minimal performance overhead compared to direct network connections.

Broader Impacts: SciStream has had a significant impact on the scientific community, facilitating real-time data processing for critical applications.

Deployment at Multiple Facilities: SciStream was successfully integrated with Argonne National Laboratory’s Polaris cluster, Clemson University’s ultrasound image processing workflow, and the Advanced Photon Source (APS)'s upstart cluster.

Workforce Development: The project mentored five students, with four publishing research on high-performance networking.

Knowledge Dissemination: Findings were shared through five peer-reviewed publications, two featured articles, and over a dozen invited talks and demonstrations at major scientific conferences.

Key Achievements:

- Developed a scalable architecture for federated scientific data streaming.

- Created an open-source toolkit supporting standard and custom proxy solutions.

- Integrated authentication methods like Globus Auth to enhance security.

- Validated SciStream on large-scale testbeds, ensuring reliability and scalability.

- Demonstrated real-time streaming applications in physics, imaging, and medical research.

By bridging gaps in scientific data streaming, SciStream accelerates discovery, fosters collaboration, and lays the foundation for next-generation federated streaming solutions.


Last Modified: 02/17/2025
Modified by: Rajkumar Kettimuthu

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page