
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | June 30, 2020 |
Latest Amendment Date: | June 6, 2022 |
Award Number: | 2018754 |
Award Instrument: | Standard Grant |
Program Manager: |
Deepankar Medhi
dmedhi@nsf.gov (703)292-2935 OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2020 |
End Date: | March 31, 2024 (Estimated) |
Total Intended Award Amount: | $749,998.00 |
Total Awarded Amount to Date: | $760,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
11200 SW 8TH ST MIAMI FL US 33199-2516 (305)348-2494 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
11200 SW 8th Street MIAMI FL US 33199-0001 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | CISE Research Resources |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Communication networks are critical components of today?s scientific workflows. Researchers require long-distance, ultra high-speed networks to transfer huge data from acquisition sites (such as Vera C. Rubin Observatory, also knowns as Large Synoptic Survey Telescope in Chile) to processing sites, and to share measurements with scientists worldwide. However, while network bandwidth is continuously increasing, the majority of data transfers are unable to efficiently utilize the added capacity due to inherent limitations of parameter settings of the network transport protocols and the lack of network state information at the end hosts. To address these challenges, Q-Factor plans to use temporal network state data to dynamically configure current transport protocol parameters to reach higher network utilization and, as a result, to improve scientific workflows.
Q-Factor leverages programmable network devices with the In-band Network Telemetry (INT) application and delivers a software solution to process in-band measurements at the end hosts. Using Q-Factor on Data Transfer Nodes (DTN)s, TCP/IP parameters will be configured according to temporal network characteristics, such as round-trip time, network utilization, and network congestion. This tuning is expected to result in increased network utilization, shorter flow completion times, and significantly fewer packet drops caused by network buffers overflow. Additionally, Q-Factor is geared to save host memory by tailoring kernel parameters and buffers to optimal sizes.
Q-Factor targets a timely issue in communication networks: underutilization of ultra high-speed networks for science workflows. In order to keep scientific progress unconstrained, future science workflows need to support emerging data-intensive science experiments (e.g., the Vera Rubin Observatory, High Luminosity Large Hadron Collider) where data generation grows significantly, reaching exabytes of traffic each year. Results of this project will also allow better understanding of optimal buffer sizes of network devices for huge flows and the interaction of various congestion control algorithms.
Experimental measurement data, network state information, network topology, software code, TCP tuning guidelines, and results will be available on the Q-Factor website https://q-factor.io, which will be maintained and indexed for at least three years after the completion of the project.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Many U.S. campuses have deployed ScienceDMZs and Data Transfer Nodes (DTNs) to improve data transfers using high-speed Research and Education (R&E) networks, as a result of the success of the NSF CC* program. DTNs are normally purpose-built computers, dedicated to the function of wide-area data transfers. However, it is commonplace for DTNs to not use network bandwidth effectively, because they have not been programmed properly for their wide-area network (WAN) environment.
Effective use of the available network bandwidth is critical for the end-to-end performance of scientific applications. As a result, DTNs must be programmed/tuned to find the best combination of optimization parameters for data transfer over a WAN environment. Tuning DTNs for effective data transfers in a WAN environment has a high cost, because the process is manual, time consuming, and requires a broad set of skills and expertise normally not found at campuses or even network operators.
In response to the cost science facilities and campuses face to effectively tune DTNs for wide-area network data transfers, Florida International University (FIU) and the Energy Sciences Network (ESnet) developed Q-Factor: a framework to enable high-speed data transfer optimization based on real-time network state information provided by programmable data planes. Q-Factor addresses data transfer tuning and optimization processes by changing how network endpoints, including DTNs, consume network state information. By the source DTN knowing network state information of the end-to-end path, such as one-way delay, and instantaneous interface and network devices' queue utilization, it can dynamically adjust the transport protocol’s transfer window, Maximum Segment Size (MSS), and buffer length to achieve optimum performance. As a result, the source and destination DTNs can avoid TCP congestion, slow start and tail drops along the path, and eventually even enable TCP pacing approaches. Major facilities, such as the Vera Rubin Observatory and other science workflow applications, would benefit from sub-second network state updates to adjust tuning parameters in real-time.
Intellectual Merit and Broader Impacts
A significant contribution to the academic community by Q-Factor is an artifact referred to as the Telemetry Agent. The Telemetry Agent addresses two conditions: host tuning based on the end host's hardware and software configuration, and second, assessment of the condition of the network state where the end host is connected. The Telemetry Agent offloads from DTN operators the responsibility of manually tuning each hardware and software component. When conceptualized, the Telemetry Agent has three main modules in the scope of the Q-Factor project: the Collector Module, the Tuning Module, and the Remote Collector Module (see figure).
The tools and artefacts from Q-Factor are being used by several academic communities. The Vera Rubin Observatory has deployed the Q-Factor Telemetry Agent on its DTN and PerfSonar nodes. The Telemetry Agent is being used to track changes on the DTN, manual or automated changes, that could lead to poor performance, as well as for logging network state changes that cause performance issues. The PerfSonar and iPerf communities collaborated with the Q-Factor project as a result of evaluations performed on the AmLIght production R&E network. The HPN-SSH project collaborated with Q-Factor to enhance its TCP telemetry gathering information to improve data transfers over long distance networks, and to create a new lightweight approach for applications to consume network telemetry. This new approach will help Q-Factor increase adoption, and help HPN-SSH understand if performance issues are a result of network or host events.
Summary/Conclusion
Q-Factor accomplished its major goal of dynamically adjusting data transfer tuning and optimization variables of the network endpoints by extending the network management plane to Data Transfer Nodes (DTNs). One of Q-Factor’s main requirements was to create a solution that would not force network operators to change their network or telemetry systems, which could result in changes to their operations. The tools developed, knowledge acquired and shared, and methodologies developed to connect pieces of code at multiple layers has resulted in Q-Factor accomplishing the goals of the project.
Last Modified: 07/31/2024
Modified by: Julio Ibarra
Please report errors in award information by writing to: awardsearch@nsf.gov.