Award Abstract # 2154874
Collaborative Research: SaTC: CORE: Small: Machine Learning for Cybersecurity: Robustness Against Concept Drift

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: THE TRUSTEES OF COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK
Initial Amendment Date: July 28, 2022
Latest Amendment Date: August 28, 2023
Award Number: 2154874
Award Instrument: Continuing Grant
Program Manager: Dan Cosley
dcosley@nsf.gov
 (703)292-8832
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2022
End Date: September 30, 2024 (Estimated)
Total Intended Award Amount: $300,000.00
Total Awarded Amount to Date: $300,000.00
Funds Obligated to Date: FY 2022 = $150,000.00
FY 2023 = $150,000.00
History of Investigator:
  • Suman Jana (Principal Investigator)
    suman@cs.columbia.edu
Recipient Sponsored Research Office: Columbia University
615 W 131ST ST
NEW YORK
NY  US  10027-7922
(212)854-6851
Sponsor Congressional District: 13
Primary Place of Performance: Columbia University
2960 Broadway
NEW YORK
NY  US  10027-6902
Primary Place of Performance
Congressional District:
13
Unique Entity Identifier (UEI): F4N1QNPB95M4
Parent UEI:
NSF Program(s): Secure &Trustworthy Cyberspace
Primary Program Source: 01002324DB NSF RESEARCH & RELATED ACTIVIT
01002223DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923, 025Z
Program Element Code(s): 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

A promising direction for cybersecurity is to use machine learning to detect threats and attacks. For instance, machine learning is currently used to detect computer viruses, malware, malicious mobile applications, spam email, and network intrusions. However, one fundamental challenge for using machine learning in this way is the problem of concept drift. Concept drift refers to the problem that threats change over time, and normal benign behavior changes over time, and as a result, machine learning algorithms rapidly degrade and become less effective as time passes. Empirically, concept drift is one of the main challenges that make it hard to apply machine learning more broadly in cybersecurity. This project will develop new methods tailored to the cybersecurity domain for addressing concept drift, and it will advance the state of knowledge on robustness against concept drift in cybersecurity. The project has the potential to improve cybersecurity protections for everyday people, including improving antivirus software, phishing detectors, fraud/scam detection, and more, thereby making the Internet safer for everyone.

The team's approach is based on an understanding of the fundamental drivers of concept drift, including both gradual drift and emergence of entirely new types of threats. Threats can often be categorized into multiple categories. For instance, malware falls into many different "malware families". Each category may experience concept drift at a different rate. This provides an opportunity for new methods that take advantage of such differences across categories. To address the problem of categories that are experiencing rapid concept drift, the team plans to develop techniques to detect which categories are suffering from concept drift to the greatest degree and then select samples from those categories for human analysts to evaluate. For new types of threats, the team plans to develop techniques to identify samples from new categories so they can be submitted for human analysis. For categories that are experiencing gradual but sustained concept drift, the team plans to explore use of semi-supervised learning and pseudo labels to help the machine learning algorithm adapt to these changes in the data.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page