Award Abstract # 1525978
AF: Small: Homological Methods for Big Enough Data

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: UNIVERSITY OF CONNECTICUT
Initial Amendment Date: July 21, 2015
Latest Amendment Date: July 21, 2015
Award Number: 1525978
Award Instrument: Standard Grant
Program Manager: Joseph Maurice Rojas
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 1, 2015
End Date: July 31, 2019 (Estimated)
Total Intended Award Amount: $340,954.00
Total Awarded Amount to Date: $340,954.00
Funds Obligated to Date: FY 2015 = $340,954.00
History of Investigator:
  • Donald Sheehy (Principal Investigator)
    drsheehy@ncsu.edu
Recipient Sponsored Research Office: University of Connecticut
438 WHITNEY RD EXTENSION UNIT 1133
STORRS
CT  US  06269-9018
(860)486-3622
Sponsor Congressional District: 02
Primary Place of Performance: University of Connecticut
371 Fairfield Way, Unit 4155
Storrs
CT  US  06269-4155
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): WNTPS995QBM7
Parent UEI:
NSF Program(s): Algorithmic Foundations
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923, 7929
Program Element Code(s): 779600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

How do we know if big data is big enough? As algorithms make more and more decisions from data, we also need these algorithms to assure us that the decisions were well-informed, i.e. that enough data went into them. The theory of homological sensor networks, a branch of topological data analysis, was originally created to test if a collection of sensors covers a domain of interest, but the same theory can test if a data set "covers" an underlying decision space. Homological methods can complement, extend, and even replace statistical methods to give confidence in the completeness of a data set. Because they are topological, they can give robust signatures or summaries of data that are invariant to a wide range of implicit or explicit transformations. This project aims to extend the theoretical and algorithmic foundations of the homological sensor networks to be applicable in data analysis. Broader impacts include strengthening connections between theoretical computer science (TCS) and applied algebraic topology, and widening the range of data analyses to which topological methods and tools apply.

The PI will train both undergraduate and graduate researchers by incorporating advanced concepts in combinatorial topology in undergraduate and graduate curricula. The PI will also educate the larger TCS and data analysis communities through expository videos and open source software.

The specific aim of the proposal is to extend guarantees on homological sensor networks to apply to non-smooth sets, k-coverage, and dynamic coverage. A second specific aim is to push these algorithmic results back into the theoretical foundations of the sampling theories that underlie data analysis problems, by extending the so-called Persistent Nerve Theorem and defining new classes of near-homeomorphisms to capture the realities of unknown transformations in data while still providing theoretical guarantees. The third specific aim is to develop algorithms that extract information from what was traditionally called "topological noise" as simple experiments reveal that although it doesn't carry topological information, it does carry useful geometric information that may be used for classification and inference.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Donald R. Sheehy "Frechet-Stable Signatures Using Persistence Homology" ACM-SIAM Symposium on Discrete Algorithms , 2018
Kevin PrattConnor RileyDonald R. Sheehy "Exploring Circle Packing Algorithms" International Symposium on Computational Geometry (Multimedia Session) , 2016
Kirk GardnerDonald R. Sheehy "k-th Nearest Neighbor Sampling in the Plane" Proceedings of the Canadian Conference on Computational Geometry , 2016
Lynn AsselinKirk P. GardnerDonald R. Sheehy "Interactive Geometric Algorithm Visualization in a Browser" International Symposium on Computational Geometry (Multimedia Session) , 2016
Mahmoodreza JahanseirDonald R. Sheehy "Transforming Hierarchical Trees on Metric Spaces" Proceedings of the Canadian Conference on Computational Geometry , 2016
Nicholas CavannaKirk GardnerDonald R. Sheehy "When and Why the Topological Coverage Criterion Works" ACM-SIAM Symposium on Discrete Algorithms , 2017
Nicholas J. Cavanna, Oliver Kiselius, Donald R. Sheehy "Computing the Shift-Invariant Bottleneck Distance for Persistence Diagrams" Canadian Conference on Computational Geometry , 2018
Sridhar Parasara Duggirala "When can we treat trajectories as points?" Canadian Conference on Computational Geometry , 2018

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project explored the use of homological methods in data analysis. The key idea in this field of topological data analysis (TDA) is to compute algebraic summaries from data sets. The questions addressed in this research started from the desire to answer qualitative questions about global properties of the data set, or more generally, the underlying space from which the data is sampled. The papers published as a result of this research span several areas within the algorithmic foundations of TDA including data structures, sampling, visualization, data summarizing, and metric approximations. Below, three of the more impactful results are highlighted.

One powerful tool in TDA is the Topological Coverage Criterion (TCC) which uses a so-called persistent homology computation to test if a collection of sensors in a network cover the entire domain. One outcome of the research was to generalize the TCC to eliminate many of the geometric constraints on the underlying domain. This work allows one to also test for robust coverage in the presence of failures.

A second outcome of the project involved the use of topological signatures for comparing continuous functions. The most obvious example of such shapes is trajectories, which can be viewed as a function from an interval (of time) to a space. Metrics to compare trajectories such as the Frechet distance adjust the speeds of the functions to align them. This is challenging and becomes computational hard for anything beyond trajectories such as surfaces. New results from this project showed how to use persistent homology to give interesting bounds on such distances in polynomial time even for surfaces and higher-dimensional inputs. This approach was novel in that it uses topological invariance to eliminate the difficult alignment step.

A third outcome of the project was to generalize one of the most important and widely used tools in TDA, the Persistent Nerve Lemma (PNL). This result is used explicitly or implicitly throughout TDA as it provides the main connection between the discrete that we compute with and the continuous spaces they represent. In this project, the PNL was generalized so that it continues to give meaningful guarantees in the presence of noise. Moreover, the guarantees are quantitative: the topological signatures implied by the Persistent Nerve Lemma differ from the ideal by an amount proportional to a measure of the topological noise.
Three PhD students were partially supported by this project, two of which defended their dissertations and the third will finish soon. These students were trained in the theory and practice of TDA. Several undergraduates were also mentored as part of this project.

 

 

 


Last Modified: 10/30/2019
Modified by: Donald Sheehy

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page