Skip to feedback

Award Abstract # 1814105
NeTS: Small: Collaborative Research: Protocol Validation using Minimally Supervised Semantic Interpretation of Text

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: PURDUE UNIVERSITY
Initial Amendment Date: August 15, 2018
Latest Amendment Date: August 15, 2018
Award Number: 1814105
Award Instrument: Standard Grant
Program Manager: Ann Von Lehmen
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2018
End Date: September 30, 2021 (Estimated)
Total Intended Award Amount: $250,000.00
Total Awarded Amount to Date: $250,000.00
Funds Obligated to Date: FY 2018 = $250,000.00
History of Investigator:
  • Dan Goldwasser (Principal Investigator)
Recipient Sponsored Research Office: Purdue University
2550 NORTHWESTERN AVE # 1100
WEST LAFAYETTE
IN  US  47906-1332
(765)494-1055
Sponsor Congressional District: 04
Primary Place of Performance: Purdue University
305 University
479072114
IN  US  47907-2114
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): YRXVL4JYCEF5
Parent UEI: YRXVL4JYCEF5
NSF Program(s): Special Projects - CNS
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 075Z, 7923
Program Element Code(s): 171400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The networks that comprise the Internet are fundamental to our society, facilitating access to medical and financial services, supporting critical infrastructure such as the power grid, and enabling emergent services such as those provided by autonomous cars and IoT (Internet of Things) devices. Network behavior is dictated by a set of instructions, or protocols, developed and tested over time. Such protocols must operate correctly and comply with requirements that are usually described in a document(s), i.e., in a textual representation. If they do not operate properly, the performance and security of a network could be compromised. The goal of this project is to increase assurance in network protocols, specifically in their compliance to specified rules, in their inter-operability and in their functionality. This project will accomplish this via a novel scheme to perform protocol testing through automated extraction of protocol requirements from their textual specification. This would mark a significant advance in the field, towards automated mechanisms that assure that network protocols are behaving as we expect them to, making networks more reliable and secure.

This multidisciplinary project combines expertise from natural language processing and computer networks to create methodologies, frameworks, a knowledge base, and tools for protocol validation for (1) compliance checking, (2) bug finding, and (3) interoperability testing. The general approach is to apply machine learning, semantic parsing and information extraction techniques to structured text (RFCs, internet-drafts) and unstructured text (blogs, forums, and bug reports), and create a knowledge base about the protocols, containing formal information such as message formats, protocol state machine, constraints, and semi-formal information such as temporal properties, tuning conditions and parameters, changes from one version to another, or known bugs. This information is organized into a knowledge base and used to validate protocol implementations through protocol fuzzying, program analysis, software model checking, and measurement methods, to check whether protocols are compliant with their specifications, to detect semantic bugs dependent on intrinsic protocol properties, or check for interoperability issues between different versions, or protocol stacks. This work is guided by protocols from three representative domains -- TCP variants, the SDN ecosystem, and IoT smart home environment.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Jero, Samuel and Pacheco, Maria Leonor and Goldwasser, Dan and Nita-Rotaru, Cristina "Leveraging Textual Specifications for Grammar-Based Fuzzing of Network Protocols" Innovative applications of artificial intelligence , v.Vol. 33 , 2019 https://doi.org/10.1609/aaai.v33i01.33019478 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The major goal of this project is to increase assurance in network protocols, specifically in their compliance to specified rules, in their interoperability, and in their functionality. This multidisciplinary project combines expertise from natural language processing and computer networks to create methodologies, frameworks, a knowledge base, and tools for protocol validation.

The major outcome of this work is an approach that allows for automated extraction of protocol finite state machines from RFC specifications. RFCs are a common way of specifying Internet protocols. Our hybrid approach consisting of three key steps: (1) large-scale word-representation learning for technical language, (2) focused zero-shot learning for mapping protocol text to a protocol-independent information language, and (3) rule-based mapping from protocol-independent information to a specific protocol FSM. The first step does not require direct annotation, and does not add to the human effort involved in building the model. Our zero-shot information extraction approach builds on that representation. Since each protocol consists of its own set of predicates and variables, we suggest a zero-shot approach in which we separate between protocols observed during training and testing. The model learns to identify and connect concepts relevant for the training protocols and at test time it is evaluated on extracting a set of symbols which were not observed at training. We show the generalizability of our FSM extraction by using the RFCs for six different protocols: BGPv4, DCCP, LTP, PPTP, SCTP and TCP. The extracted FSM can further be used for protocol validation. We demonstrated how automated extraction of an FSM from an RFC can be applied to the synthesis of attacks, with TCP and DCCP as case-studies. 

Work developed in this grant will contribute to increased assurance on protocol design and implementation. As the Internet consists of a myriad of protocols, this grant contributes to making the Internet infrastructure more resilient to failures and attacks. 

This work contributed to the education and training of several PhD students and undergraduate students through the research they conducted with support from this grant. We disseminated our results in top venues in NLP and network security journals and conferences.


 


Last Modified: 01/03/2022
Modified by: Dan Goldwasser

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page