NSF Award Search: Award # 2019163

Award Abstract # 2019163

CC* Integration-Small: Error Free File Transfer for Big Science

NSF Org:	OAC Office of Advanced Cyberinfrastructure (OAC)
Recipient:	COLORADO STATE UNIVERSITY
Initial Amendment Date:	July 1, 2020
Latest Amendment Date:	October 25, 2024
Award Number:	2019163
Award Instrument:	Standard Grant
Program Manager:	Deepankar Medhi dmedhi@nsf.gov (703)292-2935 OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering
Start Date:	July 1, 2020
End Date:	March 31, 2025 (Estimated)
Total Intended Award Amount:	$470,384.00
Total Awarded Amount to Date:	$479,484.00
Funds Obligated to Date:	FY 2020 = $470,384.00 FY 2024 = $9,100.00
History of Investigator:	Craig Partridge (Principal Investigator) craig.partridge@colostate.edu Susmit Shannigrahi (Co-Principal Investigator) Anton Betten (Former Co-Principal Investigator)
Recipient Sponsored Research Office:	Colorado State University 601 S HOWES ST FORT COLLINS CO US 80521-2807 (970)491-6355
Sponsor Congressional District:	02
Primary Place of Performance:	Colorado State University 200 W Lake St Fort Collins CO US 80521-4593
Primary Place of Performance Congressional District:	02
Unique Entity Identifier (UEI):	LT9CXX8L19G1
Parent UEI:
NSF Program(s):	CISE Research Resources, Special Projects - CNS
Primary Program Source:	01002021DB NSF RESEARCH & RELATED ACTIVIT 01002425DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	9251
Program Element Code(s):	289000, 171400
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Scientific data transfers have gotten so large that previously rare transmission errors in the Internet are causing some scientific data transfers to be corrupted. The Internet's error checking mechanisms were designed at a time when a megabyte was a large file. Now files can contain terabytes. The old error checking mechanisms are in danger of being overwhelmed. This project seeks to find new error checking mechanisms for the Internet to safely move tomorrow's scientific data efficiently and without errors.

This project addresses two fundamental issues. First, the Internet's checksums and message digests are too small (32-bits) and probably are poorly tuned to today's error patterns. It is a little-known fact that checksums can (and typically should) be designed to reliably catch specific errors. A good checksum is designed to protect against errors that it will actually encounter. So the first step in this project is to collect information about the kinds of transmission errors currently happening in the Internet for a comprehensive study. Second, today's file transfer protocols, if they find a file has been corrupted in transit, simply discard the file and transfer it again. In a world in which the file is huge (tens of terabytes or even petabytes long), that's a tremendous waste. Rather, the file transfer protocol should seek to repair the corrupted parts of the file. As the project collects data about errors, it will also design a new file transfer protocol that can incrementally verify and repair files.

This project will improve the Internet's ability to support big data transfers, both for science and commerce, for decades to come. Users will be able to transfer big files with confidence that the data will be accurately and efficiently copied over the network. This work will further NSF's Blueprint for a National Cyberinfrastructure Ecosystem by ensuring a world in which networks work efficiently to deliver trustworthy copies of big data to anyone who needs it. Additional information on the project is available at: www.hipft.net

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Threet, Zachariah and Papadopoulos, Christos and Lambert, William and Podder, Proyash and Thanasoulas, Spiros and Afanasyev, Alex and Ghafoor, Sheikh and Shannigrahi, Susmit "Securing Automotive Architectures with Named Data Networking" IEEE ITSC , 2022 https://doi.org/10.1109/ITSC55140.2022.9922194 Citation Details

Thompson, Robert and Ismail, Muhammad and Shannigrahi, Susmit "Vehicle-to-Vehicle Charging Coordination over Information Centric Networking" IEEE LCN , 2022 https://doi.org/10.1109/LCN53696.2022.9843207 Citation Details

Shannigrahi, Susmit and Partridge, Craig "Big Data, Transmission Errors, and the Internet" 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S) , 2023 https://doi.org/10.1109/DSN-S58398.2023.00040 Citation Details

Fitzgerald, Jack and Gopinath, Anju and Cadman, Logan and Abdollah, Sepideh and Shannigrahi, Susmit and Partridge, Craig "Looking for Errors TCP Misses" , 2025 Citation Details

Craig Partridge and Susmit Shannigrahi "Big Data, Transmission Errors, and the Internet" IEEE/IFIP International Conference on Dependable Systems and Networks , 2023 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The central theme of this effort was to better understand file transfer errors in the Internet. There were two hypotheses about the source of errors: one hypothesis attributed the errors to file system errors that took place during the file transfer; the other hypothesis attributed errors to network errors that were not detected by the TCP checksum.

Our effort focused identifying network errors. A secondary goal, if the network was the source of the errors, was to develop file transfer protocols that were able to detect and mitigate the errors.

During this project, we made multiple findings:

We were able to detect file transfer errors and show that both file system errors and network errors are occuring.
File transfer errors are much less common than prior studies suggested: we only detected a handful.
The types of errors occurring on today's Internet are different from the errors that were present (and planned for) when the Internet was being developed in the 1970s.
We can learn a considerable mount of information simply by capturing checksums that do not match their data by looking at the Hamming Distance between the expected and actual checksum.
We can improve our file transfer protocols to be faster by better utilizing cases where there are multiple repositories storing the same data.

These results are encouraging. We know far more about the frequency (lower than expected) and source(s) of errors (both disks and networks). We have demonstrated that it may be possible to use multiple repositories as a way to mitigate errors, without a performance penalty.

Because errors are much less common than predicted, we are caught in the predicament of (a) knowing that a problem (undetected) errors are happening, but (b) not being able to capture enough errors to properly analyze them and determine how best to mitigate their effects.

Finally, we observed that traditional data collection pipelines and testbeds do not lend themselves well for large scale measurements, such as ours. We concluded that a fundamentally different framework is needed to perform network measurement and observations at scale in the exascale era.

Last Modified: 05/23/2025
Modified by: Craig Partridge

Please report errors in award information by writing to: awardsearch@nsf.gov.