
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | June 13, 2023 |
Latest Amendment Date: | July 16, 2024 |
Award Number: | 2238693 |
Award Instrument: | Continuing Grant |
Program Manager: |
Sorin Draghici
sdraghic@nsf.gov (703)292-2232 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2023 |
End Date: | September 30, 2028 (Estimated) |
Total Intended Award Amount: | $600,322.00 |
Total Awarded Amount to Date: | $319,654.00 |
Funds Obligated to Date: |
FY 2024 = $114,330.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
155 S PLEASANT ST AMHERST MA US 01002-2234 (413)542-2804 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
155 S PLEASANT ST AMHERST MA US 01002-2234 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
01002425DB NSF RESEARCH & RELATED ACTIVIT 01002526DB NSF RESEARCH & RELATED ACTIVIT 01002627DB NSF RESEARCH & RELATED ACTIVIT 01002728DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Methods for knowledge discovery from data (e.g., for extracting patterns or finding anomalies) have found their way to research labs in life and biological sciences, and in industries such as cybersecurity. In these fields, the statistical validity of the results produced by these methods is paramount: false discoveries cannot be tolerated. Current methods do not offer such stringent statistical guarantees. This project develops algorithms for statistically-sound Knowledge Discovery from Data. It transforms the field by shifting the goal of the Knowledge Discovery process from extracting information about the available data to gaining new understanding of the noisy, random process that generates the data. The proposed methods contribute towards a faster and higher-throughput scientific pipeline, by allowing scientists and practitioners to efficiently analyze rich large datasets and to trust the results of the analysis. Researchers can then focus on their discipline-specific research tasks without worrying about computational or statistical considerations. The project includes collaborations with a local museum and a local public library, to analyze data about their collections of historic materials, and with a cybersecurity company to develop methods for fast detection of network attacks with few false positives. A diverse cohort of undergraduate students will be involved in the research and educational components of the project.
Research in knowledge discovery has mostly focused on understanding the available data, rather than the process that generated it. In the few cases where hypothesis testing was used to assess the results (mostly for simple patterns), only simplistic null models were considered, and the testing employed low-statistical-power approaches (e.g., the Bonferroni correction) to control only for one measure of false discovery, the Family-Wise Error Rate. This project is transformative because it will develop efficient methods for evaluating a wide variety of results (e.g., patterns, anomalies, graph/vertex/edge properties, and more) obtained from large rich datasets (e.g., transactional datasets, graphs, and time series), using realistic null models which are more appropriate for these tasks, and better encode available knowledge of the data generating process. We will create novel efficient procedures to sample from such models, both approximate (e.g., Markov-Chain Monte Carlo) and exact, and combine them with modern resampling- based multiple testing methods, in a multiple-hypothesis first approach that also controls the (marginal) False Discovery Rate.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
Please report errors in award information by writing to: awardsearch@nsf.gov.