
NSF Org: |
DMS Division Of Mathematical Sciences |
Recipient: |
|
Initial Amendment Date: | July 20, 2020 |
Latest Amendment Date: | July 20, 2020 |
Award Number: | 2027277 |
Award Instrument: | Standard Grant |
Program Manager: |
Tomek Bartoszynski
tbartosz@nsf.gov (703)292-4885 DMS Division Of Mathematical Sciences MPS Directorate for Mathematical and Physical Sciences |
Start Date: | September 1, 2020 |
End Date: | August 31, 2024 (Estimated) |
Total Intended Award Amount: | $607,050.00 |
Total Awarded Amount to Date: | $607,050.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
10889 WILSHIRE BLVD STE 700 LOS ANGELES CA US 90024-4200 (310)794-0102 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
520 Portola Plaza Los Angeles CA US 90095-1555 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | ATD-Algorithms for Threat Dete |
Primary Program Source: |
01002021RB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.049 |
ABSTRACT
This project develops new mathematical algorithms and models involving knowledge graphs. A knowledge graph represents what is known about a subject in the form of labeled nodes and edges. More than simply labeled data, knowledge graphs organize data according to high-level meanings and assign globally unique identification to each node in the graph to match real-world entities. Much work on knowledge graphs treats databases and queries. In contrast, in the context of threat detection, this project focuses on algorithms that identify latent information in the graph and predictive models associated with data on the graph. The project will involve a combination of mathematical methods for subgraph isomorphism detection, time series analysis, agent-based and multiscale modeling, and pattern recognition. The project will train a postdoctoral scholar, PhD student, and six undergraduate researchers through involvement in the research.
This project brings together several different focused problems with large, multimodal, complex datasets. The data is organized into a knowledge graph in which additional information is added and absorbed as it becomes available. This project considers three types of knowledge graphs each for different applications: (1) knowledge graphs constructed from complex multi-part narratives; (2) knowledge graphs constructed from heterogeneous online content; and (3) knowledge graphs associated with large-scale human interaction dynamics such as a global pandemic. For (1), algorithms will be designed to identify important causal subgraphs. For (2), the project aims to identify threats in space and time based on templated patterns. For (3), desired goals are both a predictive ability for actions from a micro to macro scale along with tools to assess potential impact versus cost of preventative measures, from local to regional to country-wide scale.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
A major goal of this project is the development of new mathematical algorithms and models involving knowledge graphs. A knowledge graph represents what is known about a subject in the form of labeled nodes and edges. More than simply labeled data, knowledge graphs organize data according to high-level meanings and assign globally unique identification to each node in the graph to match real-world entities. Much work on knowledge graphs treats databases and queries. In contrast, in the context of threat detection, this project focuses on algorithms that identify latent information in the graph and predictive models associated with data on the graph. The project involves a combination of mathematical methods for subgraph isomorphism detection, time series analysis, agent-based and multiscale modeling, and pattern recognition. To build our methodology we develop new models for graph-structured data that could become part of a knowledge graph.
One important outcome of this project is the development of methods that leverage the role of symmetry in subgraph matching, both in the description of the graphs in question and in how it confounds the search process. More specifically, this work shows how to quantify these effects and how to use symmetries to increase the efficiency of subgraph isomorphism algorithms. The work introduces rigorous definitions of structural equivalence and establish conditions for when it can be used to generate more solutions. This work illustrates how to adapt standard search routines to utilize these symmetries to accelerate search and compactly describe the solution space. As am example, we adapt a state-of-the-art solver and perform a comprehensive series of tests to demonstrate these methods' efficacy on a standard benchmark set. We extend these methods to multiplex graphs and present results on large multiplex networks drawn from transportation systems, social media, adversarial attacks, and knowledge graphs.
The ability to leverage symmetries and structural equivalences results in the ability to identify extremely large solution spaces for the subgraph matching problem on the order of 10100 or more solutions. In a real use-case scenario, analysts may need to query additional information about template nodes or world nodes to reduce the problem size and the solution space. Currently, this query process is done by hand, based on the personal experience of analysts. By analogy to the well-known active learning problem in machine learning classification problems, we present a machine-based active learning problem for the subgraph match problem in which the machine suggests optimal template target nodes that would be most likely to reduce the solution space when it is otherwise overly large and complex. The humans in the loop can then include additional information about those target nodes. We present some case studies for both synthetic and real-world datasets for multichannel subgraph matching.
In addition to algorithm development, this project showed examples of large knowledge graphs created from modern databases of information, thereby allowing the user to quickly understand comprehensive information embedded in the data. Examples include a knowledge graph from twitter content during and about the COVID19 pandemic, a knowledge graph about the QAnon conspiracy network, a knowledge graph about hate speech and online content, and one about a crime narrative. These graphs point to a multitude of ways in which information can be understood once it is put into such a format.
At the end of this project, we studied the use of large language models in the context of knowledge graphs and knowledge graph queries.
Last Modified: 12/03/2024
Modified by: Andrea L Bertozzi
Please report errors in award information by writing to: awardsearch@nsf.gov.