
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | July 21, 2015 |
Latest Amendment Date: | July 21, 2015 |
Award Number: | 1525978 |
Award Instrument: | Standard Grant |
Program Manager: |
Joseph Maurice Rojas
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2015 |
End Date: | July 31, 2019 (Estimated) |
Total Intended Award Amount: | $340,954.00 |
Total Awarded Amount to Date: | $340,954.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
438 WHITNEY RD EXTENSION UNIT 1133 STORRS CT US 06269-9018 (860)486-3622 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
371 Fairfield Way, Unit 4155 Storrs CT US 06269-4155 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Algorithmic Foundations |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
How do we know if big data is big enough? As algorithms make more and more decisions from data, we also need these algorithms to assure us that the decisions were well-informed, i.e. that enough data went into them. The theory of homological sensor networks, a branch of topological data analysis, was originally created to test if a collection of sensors covers a domain of interest, but the same theory can test if a data set "covers" an underlying decision space. Homological methods can complement, extend, and even replace statistical methods to give confidence in the completeness of a data set. Because they are topological, they can give robust signatures or summaries of data that are invariant to a wide range of implicit or explicit transformations. This project aims to extend the theoretical and algorithmic foundations of the homological sensor networks to be applicable in data analysis. Broader impacts include strengthening connections between theoretical computer science (TCS) and applied algebraic topology, and widening the range of data analyses to which topological methods and tools apply.
The PI will train both undergraduate and graduate researchers by incorporating advanced concepts in combinatorial topology in undergraduate and graduate curricula. The PI will also educate the larger TCS and data analysis communities through expository videos and open source software.
The specific aim of the proposal is to extend guarantees on homological sensor networks to apply to non-smooth sets, k-coverage, and dynamic coverage. A second specific aim is to push these algorithmic results back into the theoretical foundations of the sampling theories that underlie data analysis problems, by extending the so-called Persistent Nerve Theorem and defining new classes of near-homeomorphisms to capture the realities of unknown transformations in data while still providing theoretical guarantees. The third specific aim is to develop algorithms that extract information from what was traditionally called "topological noise" as simple experiments reveal that although it doesn't carry topological information, it does carry useful geometric information that may be used for classification and inference.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project explored the use of homological methods in data analysis. The key idea in this field of topological data analysis (TDA) is to compute algebraic summaries from data sets. The questions addressed in this research started from the desire to answer qualitative questions about global properties of the data set, or more generally, the underlying space from which the data is sampled. The papers published as a result of this research span several areas within the algorithmic foundations of TDA including data structures, sampling, visualization, data summarizing, and metric approximations. Below, three of the more impactful results are highlighted.
One powerful tool in TDA is the Topological Coverage Criterion (TCC) which uses a so-called persistent homology computation to test if a collection of sensors in a network cover the entire domain. One outcome of the research was to generalize the TCC to eliminate many of the geometric constraints on the underlying domain. This work allows one to also test for robust coverage in the presence of failures.
A second outcome of the project involved the use of topological signatures for comparing continuous functions. The most obvious example of such shapes is trajectories, which can be viewed as a function from an interval (of time) to a space. Metrics to compare trajectories such as the Frechet distance adjust the speeds of the functions to align them. This is challenging and becomes computational hard for anything beyond trajectories such as surfaces. New results from this project showed how to use persistent homology to give interesting bounds on such distances in polynomial time even for surfaces and higher-dimensional inputs. This approach was novel in that it uses topological invariance to eliminate the difficult alignment step.
A third outcome of the project was to generalize one of the most important and widely used tools in TDA, the Persistent Nerve Lemma (PNL). This result is used explicitly or implicitly throughout TDA as it provides the main connection between the discrete that we compute with and the continuous spaces they represent. In this project, the PNL was generalized so that it continues to give meaningful guarantees in the presence of noise. Moreover, the guarantees are quantitative: the topological signatures implied by the Persistent Nerve Lemma differ from the ideal by an amount proportional to a measure of the topological noise.
Three PhD students were partially supported by this project, two of which defended their dissertations and the third will finish soon. These students were trained in the theory and practice of TDA. Several undergraduates were also mentored as part of this project.
Last Modified: 10/30/2019
Modified by: Donald Sheehy
Please report errors in award information by writing to: awardsearch@nsf.gov.