Award Abstract # 1750024
CAREER: Scalable Information Flow Monitoring and Enforcement through Data Provenance Unification

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: UNIVERSITY OF ILLINOIS
Initial Amendment Date: March 26, 2018
Latest Amendment Date: April 13, 2022
Award Number: 1750024
Award Instrument: Continuing Grant
Program Manager: Phillip Regalia
pregalia@nsf.gov
 (703)292-2981
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: April 1, 2018
End Date: March 31, 2024 (Estimated)
Total Intended Award Amount: $528,077.00
Total Awarded Amount to Date: $528,077.00
Funds Obligated to Date: FY 2018 = $98,468.00
FY 2019 = $101,897.00

FY 2020 = $105,468.00

FY 2021 = $109,188.00

FY 2022 = $113,056.00
History of Investigator:
  • Adam Bates (Principal Investigator)
    batesa@illinois.edu
Recipient Sponsored Research Office: University of Illinois at Urbana-Champaign
506 S WRIGHT ST
URBANA
IL  US  61801-3620
(217)333-2187
Sponsor Congressional District: 13
Primary Place of Performance: University of Illinois at Urbana-Champaign
IL  US  61820-7473
Primary Place of Performance
Congressional District:
13
Unique Entity Identifier (UEI): Y8CWNJRCNN91
Parent UEI: V2PHZ2CSCH63
NSF Program(s): Secure &Trustworthy Cyberspace
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
01001920DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT

01002223DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 025Z, 1045, 7434
Program Element Code(s): 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

System intrusions have becoming more subtle and complex. Attackers now covertly observe and probe systems for prolonged periods before launching devastating attacks. In such an environment, it has grown prohibitively difficult for system administrators to identify suspicious events, correlate these events into an attack pattern, and determine an appropriate response. Data Provenance is a method of modeling a system's execution in the form of a causal relationship graph, allowing investigators to trace the ancestry of data objects and identify relationships between seemingly independent events. The goal of the proposed work is to develop techniques that enable the use of data provenance as an expressive and efficient monitoring tool in large distributed systems. These mechanisms will enable unprecedented capability to reason about system events, centrally monitor activities within data centers, and express fine-grained enforcement of security properties based on the historical flow of data. Research and software artifacts will be made available to the broader community through the Linux provenance web site.

The proposed work will examine central challenges related to expressivity and scalability that currently prevent the further proliferation of provenance-based auditing techniques. To address the semantic gap that has traditionally prevented system-layer auditing from being able to explain higher-level application behaviors, this project pursues the design of universal provenance mechanisms that leverage binary analysis to transparently identify siloed application-layer logging activities, extract their semantics, and graft the information onto a causal relationship graph that encodes the entire system's execution. Grammar induction techniques will be leveraged to overcome the tremendous storage burden of provenance and provide a scalable central monitoring framework for data centers. After enriching system-layer auditing and enabling the efficient communication of suspicious activities via provenance traces, data provenance will be integrated into enforcement mechanisms to address critical security challenges including regulatory compliance, information flow control, and fault attribution. The advancement of state-of-the-art of provenance-based tracing and enforcement should establish a new baseline for reasoning about the flow of data in today's complex computing systems.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 34)
Bansal, A. and Kandikuppa, A. and Chen, CY. and Hasan, M. and Bates, A. and Mohan, S. "Towards Efficient Auditing for Real-Time Systems." 27th European Symposium on Research in Computer Security , 2022 https://doi.org/10.1007/978-3-031-17143-Yes7_30 Citation Details
Bates, Adam and Hassan, Wajih Ul "Can Data Provenance Put an End to the Data Breach?" IEEE Security & Privacy , v.17 , 2019 10.1109/MSEC.2019.2913693 Citation Details
Datta, Pubali and Kumar, Prabuddha and Morris, Tristan and Grace, Michael and Rahmati, Amir and Bates, Adam "Valve: Securing Function Workfows on Serverless Computing Platforms" The Web Conference , 2020 3366423.3380173 Citation Details
Goyal, Akul and Han, Xueyuan and Wang, Gang and Bates, Adam "Sometimes, You Arent What You Do: Mimicry Attacks against Provenance Graph Host Intrusion Detection Systems" 30th Network and Distributed System Security Symposium , 2023 Citation Details
Han, Xueyuan and Pasquier, Thomas and Bates, Adam and Mickens, James and Seltzer, Margo "UNICORN: Runtime Provenance-Based Detector for Advanced Persistent Threats" Network and Distributed System Security Symposium , 2020 Citation Details
Hassan, Wajih Ul and Aguse, Lemay and Aguse, Nuraini and Bates, Adam and Moyer, Thomas "Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs" Network and Distributed Systems Security Symposium , 2018 Citation Details
Hassan, Wajih Ul and Bates, Adam and Marino, Daniel "Tactical Provenance Analysis for Endpoint Detection and Response Systems" Proceedings of the IEEE Symposium on Security and Privacy , 2020 Citation Details
Hassan, Wajih Ul and Guo, Shengjian and Li, Ding and Chen, Zhengzhang and Jee, Kangkook and Li, Zhichun and Bates, Adam "NoDoze: Combatting Threat Alert Fatigue with Automated Provenance Triage" Network and Distributed Systems Security Symposium , 2019 Citation Details
Hassan, Wajih Ul and Hussain, Saad and Bates, Adam "Analysis of Privacy Protections in Fitness Tracking Social Networks -or You can run, but can you hide?" 27th USENIX Security Symposium , 2018 Citation Details
Hassan, Wajih Ul and Li, Ding and Jee, Kangkook and Yu, Xiao and Zou, Kexuan and Wang, Dawei and Chen, Zhengzhang and Li, Zhichun and Rhee, Junghwan and Gui, Jiaping and Bates, Adam "This is Why We Cant Cache Nice Things: Lightning-Fast Threat Hunting using Suspicion-Based Hierarchical Storage" Annual Computer Security Applications Conference , 2020 https://doi.org/10.1145/3427228.3427255 Citation Details
Hassan, Wajih Ul and Noureddine, Mohammad Ali and Datta, Pubali and Bates, Adam "OmegaLog: High-Fidelity Attack Investigation via Transparent Multi-layer Log Analysis" Network and Distributed System Security Symposium , 2020 Citation Details
(Showing: 1 - 10 of 34)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Data Provenance is a promising cybersecurity technique that represents a series of computer events as a causal relationship graph that describes the history of interactions between computer objects like programs, files, and network connections. This award supported research that improved the precision with which provenance analysis techniques can describe suspicious events in computers, while also simultaneously improving its speed and efficiency. The first major outcome of this program was the ability to transparently capture and represent events from different layers of a computing system in a single unified provenance graph ("Universal Provenance Framework" figure). The second major outcome was a set of techniques to efficiently monitor the provenance many thousands of computers performing highly redundant tasks in data centers. By representing provenance graphs as formal grammars, we combined similar graphs and removed redundancies to create a single global representation of data center activity. This global representation still identified suspicious attack behaviors ("Winnower Graph" figure). The final major outcome was the application of data provenance analysis for access control, regulatory compliance, and attribution in modern computers. One example of this outcome was methods for demonstrating compliance with privacy regulations, such as the EU’s General Data Protection Regulation (GDPR), in provenance form ("GDPR Provenance" figure). Results from these outcomes were published in academic venues and software artifacts were made available to the broader security community. Over the course of the program, this award supported the studies of 8 PhD, 2 Master, and 4 undergraduate students at the University of Illinois at Urbana-Champaign.


Last Modified: 08/07/2024
Modified by: Adam Bates

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page