
NSF Org: |
DGE Division Of Graduate Education |
Recipient: |
|
Initial Amendment Date: | April 16, 2021 |
Latest Amendment Date: | May 2, 2024 |
Award Number: | 2114892 |
Award Instrument: | Standard Grant |
Program Manager: |
Li Yang
liyang@nsf.gov (703)292-2677 DGE Division Of Graduate Education EDU Directorate for STEM Education |
Start Date: | May 1, 2021 |
End Date: | April 30, 2025 (Estimated) |
Total Intended Award Amount: | $219,993.00 |
Total Awarded Amount to Date: | $219,993.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1000 HILLTOP CIR BALTIMORE MD US 21250-0001 (410)455-3140 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1000 Hilltop CIrcle Baltimore MD US 21250-0001 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Secure &Trustworthy Cyberspace |
Primary Program Source: |
04002122DB NSF Education & Human Resource |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.076 |
ABSTRACT
One of the most critical security challenges of the 21st century is protecting the cyber-physical systems that manage and control our infrastructure, vehicles, homes, and personal devices as well as the information that they store, use and exchange. Artificial intelligence (AI) and machine learning-based tools can help human analysts sort through large volumes of data to determine if an attack on these systems has happened. Yet, AI components are also vulnerable to attacks, and require development of techniques to make them more robust. This collaborative project between the University of Maryland Baltimore County (UMBC) and the University of Illinois addresses the research and educational aspects of combining AI and cybersecurity. Educational and training materials will be developed for use by college and university instructors and students and by cybersecurity and AI professionals. These materials will address how AI can improve security systems and how cybersecurity analytics can protect AI systems. In addition, the project will recruit students from groups that have been traditionally underrepresented in computing.
This project has three interrelated topics. The first focuses on education and extends the project team?s existing cybersecurity concept inventory to include relevant AI-related concepts. Student knowledge and understanding of cybersecurity and AI relatedness will be assessed before and after taking AI or cybersecurity courses. Educational materials and projects will also be created to demonstrate how AI can be applied to cybersecurity problems and how cybersecurity tools can protect AI systems from attack. The second topic explores how the latest AI tools can support cybersecurity tasks. The creation and maintenance of semantic knowledge graphs of cyberthreat information will be studied and used to support reinforcement learning systems that are better at detecting the presence of malware in a host. The third topic focuses on finding new ways that cybersecurity tools can protect AI systems from becoming compromised by attacks such as data poisoning. Cyberthreat knowledge graphs and neural networks will be used to detect and eliminate likely disinformation from data used to train AI-based cybersecurity systems. This aspect of the project has applications beyond cybersecurity, such as countering disinformation.
This project is supported by the Secure and Trustworthy Cyberspace (SaTC) program, which funds proposals that address cybersecurity and privacy, and in this case specifically cybersecurity education. The SaTC program aligns with the Federal Cybersecurity Research and Development Strategic Plan and the National Privacy Research Strategy to protect and preserve the growing social and economic benefits of cyber systems while ensuring security and privacy.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This collaborative project between the University of Maryland, Baltimore County and the University of Illinois Urbana-Champaign addressed both the research and educational aspects of applying AI technology to cybersecurity. Our goals were to (1) carry out novel research on applying the latest AI techniques to cybersecurity problems and explore how attacks on AI systems can be mitigated, (2) extend our work on evaluating students understanding of the underlying security concepts to include AI-related topics, and (3) create and evaluate examples for undergraduate, graduate and professional courses on both cybersecurity and AI to cover the concepts, examples, and tools illustrating how they can support one another. At UMBC the project provided partial support for three PhD students and multiple undergraduate students.
We worked with a group of undergraduate students who learned how to build language understanding pipelines using the spaCy NLP tools. They built a corpus of text about cybersecurity by scraping relevant text from web pages and documents, constructed a set of domain-relevant entity types for cybersecurity, configured and applied annotation framework tools, and used the Prodigy annotation system to create training and evaluation datasets. Additional modules were also created using regular expressions to recognize and extract cybersecurity-relevant entities from text, such as URLs, email addresses, IP addresses, hash values, and process identifiers. The results of this work were presented and published in several venues.
We worked with the second group of students who explored how reinforcement learning (RL) can be used to build better tools to detect malware infecting a computer. The group learned how to collect and use data from Virustotal using the tasks in the Machine Learning Security Evasion Competition challenge as a problem framework. The group experimented with using RL techniques for the defender challenge, in which contestants develop malware detection models to be tested in the later attacker challenge.
We also studied the problem of privacy-preserving data generation, which involves creating new data that maintains privacy while preserving key characteristics and properties of the original data so that it is still useful in creating downstream models of attacks. We explored a technique we call Knowledge Infused Privacy Preserving Data Generation that uses a generative adversarial network trained on system data for generating synthetic datasets that can replace original data for tasks while protecting sensitive data. We demonstrated this model by synthesizing network data captured by the Wireshark network capture tool and showed that the synthetic dataset holds up to the constraints of the network-specific datasets and can replace the original dataset in downstream tasks. We also applied it to sharing agricultural data.
We conducted research on building and using large language models (LLMs) for cybersecurity applications. This required creating a corpus of cybersecurity-relevant text for enhancing existing LLMs and exploring several use cases for the resulting model. To evaluate how well LLMs are at understanding cybersecurity problems, we used several public LLM systems to answer questions developed for evaluating how well students understand cybersecurity concepts and found that the systems did surprisingly well. We also studied using the emergent reasoning capabilities of large language models (LLMs) to detect inconsistencies between extracted facts and their provenance. We investigated the effects of architecture, such as Encoder-Decoder and Decoder, size, and the impact of entities on the identification capabilities of LLMs.
Last Modified: 07/06/2025
Modified by: Timothy W Finin
Please report errors in award information by writing to: awardsearch@nsf.gov.