Award Abstract # 2027277
ATD: Algorithms for Threat Detection in Knowledge Graphs

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: UNIVERSITY OF CALIFORNIA, LOS ANGELES
Initial Amendment Date: July 20, 2020
Latest Amendment Date: July 20, 2020
Award Number: 2027277
Award Instrument: Standard Grant
Program Manager: Tomek Bartoszynski
tbartosz@nsf.gov
 (703)292-4885
DMS
 Division Of Mathematical Sciences
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: September 1, 2020
End Date: August 31, 2024 (Estimated)
Total Intended Award Amount: $607,050.00
Total Awarded Amount to Date: $607,050.00
Funds Obligated to Date: FY 2020 = $607,050.00
History of Investigator:
  • Andrea Bertozzi (Principal Investigator)
    bertozzi@math.ucla.edu
  • P Jeffrey Brantingham (Co-Principal Investigator)
Recipient Sponsored Research Office: University of California-Los Angeles
10889 WILSHIRE BLVD STE 700
LOS ANGELES
CA  US  90024-4200
(310)794-0102
Sponsor Congressional District: 36
Primary Place of Performance: University of California-Los Angeles
520 Portola Plaza
Los Angeles
CA  US  90095-1555
Primary Place of Performance
Congressional District:
36
Unique Entity Identifier (UEI): RN64EPNH8JC6
Parent UEI:
NSF Program(s): ATD-Algorithms for Threat Dete
Primary Program Source: 01002021DB NSF RESEARCH & RELATED ACTIVIT
01002021RB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 6877
Program Element Code(s): 046Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

This project develops new mathematical algorithms and models involving knowledge graphs. A knowledge graph represents what is known about a subject in the form of labeled nodes and edges. More than simply labeled data, knowledge graphs organize data according to high-level meanings and assign globally unique identification to each node in the graph to match real-world entities. Much work on knowledge graphs treats databases and queries. In contrast, in the context of threat detection, this project focuses on algorithms that identify latent information in the graph and predictive models associated with data on the graph. The project will involve a combination of mathematical methods for subgraph isomorphism detection, time series analysis, agent-based and multiscale modeling, and pattern recognition. The project will train a postdoctoral scholar, PhD student, and six undergraduate researchers through involvement in the research.

This project brings together several different focused problems with large, multimodal, complex datasets. The data is organized into a knowledge graph in which additional information is added and absorbed as it becomes available. This project considers three types of knowledge graphs each for different applications: (1) knowledge graphs constructed from complex multi-part narratives; (2) knowledge graphs constructed from heterogeneous online content; and (3) knowledge graphs associated with large-scale human interaction dynamics such as a global pandemic. For (1), algorithms will be designed to identify important causal subgraphs. For (2), the project aims to identify threats in space and time based on templated patterns. For (3), desired goals are both a predictive ability for actions from a micro to macro scale along with tools to assess potential impact versus cost of preventative measures, from local to regional to country-wide scale.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 20)
Adams, Clay and Bozhidarova, Malvina and Chen, James and Gao, Andrew and Liu, Zhengtong and Hunter Priniski, J. and Lin, Junyuan and Sonthalia, Rishi and Bertozzi, Andrea L. and Jeffrey Brantingham, P. "Knowledge Graphs of the QAnon Twitter Network" 2022 IEEE International Conference on Big Data (Big Data) , 2022 https://doi.org/10.1109/BigData55660.2022.10021128 Citation Details
Alaverdian, Mariam and Gilroy, William and Kirgios, Veronica and Li, Xia and Matuk, Carolina and McKenzie, Daniel and Ruangkriengsin, Tachin and Bertozzi, Andrea L. and Jeffrey Brantingham, P. "Who killed Lilly Kane? A case study in applying knowledge graphs to crime fiction." 2020 IEEE International Conference on Big Data (Big Data) , 2020 https://doi.org/10.1109/BigData50022.2020.9378079 Citation Details
Bozhidarova, Malvina and Chang, Jonathn and Ale-Rasool, Aaishah and Liu, Yuxiang and Ma, Chongyao and Bertozzi, Andrea L. and Brantingham, P. Jeffrey and Lin, Junyuan and Krishnagopal, Sanjukta "Hate speech and hate crimes: a data-driven study of evolving discourse around marginalized groups" , 2023 https://doi.org/10.1109/BigData59044.2023.10386312 Citation Details
Brantingham, P. Jeffrey and Carter, Jeremy and MacDonald, John and Melde, Chris and Mohler, George "Is the recent surge in violence in American cities due to contagion?" Journal of Criminal Justice , v.76 , 2021 https://doi.org/10.1016/j.jcrimjus.2021.101848 Citation Details
Brantingham, P. Jeffrey and Mohler, George and MacDonald, John and Pescosolido, ed., Bernice "Changes in publicpolice cooperation following the murder of George Floyd" PNAS Nexus , v.1 , 2022 https://doi.org/10.1093/pnasnexus/pgac189 Citation Details
Brantingham, P. Jeffrey and Uchida, Craig D. "Public cooperation and the police: Do calls-for-service increase after homicides?" Journal of Criminal Justice , v.73 , 2021 https://doi.org/10.1016/j.jcrimjus.2021.101785 Citation Details
Brantingham, P. Jeffrey and Yuan, Baichuan and Herz, Denise "Is Gang Violent Crime More Contagious than Non-Gang Violent Crime?" Journal of Quantitative Criminology , 2020 https://doi.org/10.1007/s10940-020-09479-1 Citation Details
Chen, Bohan and Bertozzi, Andrea L. "AutoKG: Efficient Automated Knowledge Graph Generation for Language Models" , 2023 https://doi.org/10.1109/BigData59044.2023.10386454 Citation Details
Flocco, Dominic and Palmer-Toy, Bryce and Wang, Ruixiao and Zhu, Hongyu and Sonthalia, Rishi and Lin, Junyuan and Bertozzi, Andrea L. and Jeffrey Brantingham, P. "An Analysis of COVID-19 Knowledge Graph Construction and Applications" 2021 IEEE Conference on Big Data , 2021 https://doi.org/10.1109/BigData52589.2021.9671479 Citation Details
Ge, Yurun and Bertozzi, Andrea L. "Active Learning for the Subgraph Matching Problem" 2021 IEEE International Conference on Big Data (Big Data) , 2021 https://doi.org/10.1109/BigData52589.2021.9671760 Citation Details
Ge, Yurun and Yang, Dominic and Bertozzi, Andrea L "Iterative active learning strategies for subgraph matching" Pattern Recognition , v.158 , 2024 https://doi.org/10.1016/j.patcog.2024.110797 Citation Details
(Showing: 1 - 10 of 20)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

A major goal of this project is the development of new mathematical algorithms and models involving knowledge graphs. A knowledge graph represents what is known about a subject in the form of labeled nodes and edges. More than simply labeled data, knowledge graphs organize data according to high-level meanings and assign globally unique identification to each node in the graph to match real-world entities. Much work on knowledge graphs treats databases and queries. In contrast, in the context of threat detection, this project focuses on algorithms that identify latent information in the graph and predictive models associated with data on the graph. The project involves a combination of mathematical methods for subgraph isomorphism detection, time series analysis, agent-based and multiscale modeling, and pattern recognition.  To build our methodology we develop new models for graph-structured data that could become part of a knowledge graph.

One important outcome of this project is the development of methods that leverage the role of symmetry in subgraph matching, both in the description of the graphs in question and in how it confounds the search process. More specifically, this work shows how to quantify these effects and how to use symmetries to increase the efficiency of subgraph isomorphism algorithms. The work introduces rigorous definitions of structural equivalence and establish conditions for when it can be used to generate more solutions. This work illustrates how to adapt standard search routines to utilize these symmetries to accelerate search and compactly describe the solution space.  As am example, we adapt a state-of-the-art solver and perform a comprehensive series of tests to demonstrate these methods' efficacy on a standard benchmark set. We extend these methods to multiplex graphs and present results on large multiplex networks drawn from transportation systems, social media, adversarial attacks, and knowledge graphs.

The ability to leverage symmetries and structural equivalences results in the ability to identify extremely large solution spaces for the subgraph matching problem on the order of 10100 or more solutions.  In a real use-case scenario, analysts may need to query additional information about template nodes or world nodes to reduce the problem size and the solution space. Currently, this query process is done by hand, based on the personal experience of analysts. By analogy to the well-known active learning problem in machine learning classification problems, we present a machine-based active learning problem for the subgraph match problem in which the machine suggests optimal template target nodes that would be most likely to reduce the solution space when it is otherwise overly large and complex. The humans in the loop can then include additional information about those target nodes. We present some case studies for both synthetic and real-world datasets for multichannel subgraph matching.

In addition to algorithm development, this project showed examples of large knowledge graphs created from modern databases of information, thereby allowing the user to quickly understand comprehensive information embedded in the data.  Examples include a knowledge graph from twitter content during and about the COVID19 pandemic, a knowledge graph about the QAnon conspiracy network, a knowledge graph about hate speech and online content, and one about a crime narrative.  These graphs point to a multitude of ways in which information can be understood once it is put into such a format. 

At the end of this project, we studied the use of large language models in the context of knowledge graphs and knowledge graph queries. 

 


Last Modified: 12/03/2024
Modified by: Andrea L Bertozzi

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page