Skip to feedback

Award Abstract # 1850052
CRII: AF: Enriched Topological Summaries for Inverse Problems

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: RESEARCH FOUNDATION FOR THE STATE UNIVERSITY OF NEW YORK, THE
Initial Amendment Date: January 30, 2019
Latest Amendment Date: January 30, 2019
Award Number: 1850052
Award Instrument: Standard Grant
Program Manager: Tracy Kimbrel
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: July 1, 2019
End Date: June 30, 2021 (Estimated)
Total Intended Award Amount: $174,074.00
Total Awarded Amount to Date: $174,074.00
Funds Obligated to Date: FY 2019 = $174,074.00
History of Investigator:
  • Justin Curry (Principal Investigator)
    jmcurry@albany.edu
Recipient Sponsored Research Office: SUNY at Albany
1400 WASHINGTON AVE
ALBANY
NY  US  12222-0100
(518)437-4974
Sponsor Congressional District: 20
Primary Place of Performance: SUNY at Albany
NY  US  12222-1000
Primary Place of Performance
Congressional District:
20
Unique Entity Identifier (UEI): NHH3T1Z96H29
Parent UEI: NHH3T1Z96H29
NSF Program(s): Algorithmic Foundations
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7796, 7929, 8228
Program Element Code(s): 779600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

With Big Data comes the need for succinct data summaries that are easy to visualize and understand. Topological Data Analysis (TDA) offers a particular suite of computational tools for summarizing and visualizing shape in high-dimensional data sets. These tools have been used in recent years to guide new discoveries in cancer research, neuroscience, materials science, image analysis, and many areas where shape, broadly construed, is of importance. However, by summarizing data in a succinct way certain important differences between data sets can often go undetected. This project provides a new mathematical framework for precisely measuring how lossy the most popular methods in TDA are. By developing new ways for quantifying how large-scale differences go undetected, the project offers enrichments of current topological methods to obtain novel data science tools with greater distinguishing power. The awarded funds will go primarily to fund a graduate student to aid the PI in basic research and development of these enriched topological summaries. Computational experiments on data sets of public interest, e.g. time-varying socio-economic indicators and weather data, will be carried out to test performance of these tools against current state of the art methods. Additionally, the PI will incorporate these novel methods into the data science curriculum at the PI's host institution and recruit students from historically under-represented groups to be trained as the nation's next generation of data scientists.

This project is an ambitious extension of earlier work undertaken by the PI and provides a targeted attack on the inverse problem for the main objects of study in topological data analysis: the merge tree, which tracks how connected components of the sub-level set of a function evolves; Reeb graphs, which tracks connected components of the fiber of a function; and the barcode/persistence diagram, which is a collection of intervals/points in the plane that represent an algorithmic pairing of critical points of a function on a space. The PI showed in earlier work how merge trees determine the associated barcode and provided a precise enumeration of how many distinct merge trees have the same barcode. By identifying functions on the real line that are related by an orientation preserving coordinate transformation, the PI showed how a novel enrichment of the merge tree---the chiral merge tree---faithfully captures these equivalence classes of functions and offers exponential distinguishing power over the barcode for time-series analysis. This project aims to carry out similar analysis for functions on surfaces, with an eye toward improving the classification performance of persistent homology in image analysis, and for Reeb graphs, to better integrate with level-set persistent homology. By carrying out a careful study of inverse problems in each of these settings, novel enrichments analogous to the chiral merge tree will be developed. Additionally, to make these enriched topological summaries useful for data classification tasks, novel metrics will be defined for comparing each of these summaries and algorithms will be developed for the efficient computation of these metrics.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Catanzaro, Michael J. and Curry, Justin M. and Fasy, Brittany Terese and Lazovskis, Jnis and Malen, Greg and Riess, Hans and Wang, Bei and Zabka, Matthew "Moduli spaces of morse functions for persistence" Journal of Applied and Computational Topology , 2020 10.1007/s41468-020-00055-x Citation Details
Curry, Justin and Patel, Amit "Classification of Constructible Cosheaves" Theory and applications of categories , v.35 , 2020 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Topological Data Analysis (TDA) has grown tremendously as a rich toolkit for analyzing data sets where shape is an important feature. The success of TDA comes largely from its ability to take a complicated, potentially high-dimensional, data set and summarize it using topological descriptors known as merge trees, Reeb graphs, or persistence barcodes (also known as persistence diagrams). These descriptors are compact, lightweight summaries of a data set that are stable with respect to perturbations and certain types of noise. The passage from data to these summaries is sometimes called the ?TDA pipeline?. This project emerged from the observation that qualitatively different data sets can produce identical merge trees and barcodes, when passed through the TDA pipeline; see Figure 1 for a cartoon example. The goals of the project were to produce new enriched topological summaries (ETS) that stand upstream of traditional topological summaries (TS) that can distinguish these data sets as well data sets of practical interest, all without sacrificing stability or computability; see Figure 2. 

This project was successful in several regards. First, the PI and their collaborators were able to identify several enrichments of traditional TDA by first counting how many different data sets produced the same merge tree or barcode. The notion of ?data set? and ?different? was varied across a broad class of examples, including Morse functions and embeddings of shapes in three dimensions. The project advanced knowledge of the theory underlying TDA methods and produced new connections between TDA, (co)sheaf theory, combinatorics and geometric group theory. Additionally, one of the first combinatorial results that distinguishes merge trees from phylogenetic trees was proved as a result of this work, thereby opening the door to further avenues of exploration in the geometry of the space of all possible merge trees. Preliminary results on doing statistics on the space of merge trees and barcodes were also provided.

Perhaps most significantly, one of the main ETS developed by this project, the Decorated Merge Tree (DMT), was identified as a new lightweight and stable summary of data sets where features such as clusters and circles can be tracked in tandem. Both theoretical and computational definitions were provided and their implementation in Python is now freely available on GitHub. As an illustration of this new tool see Figure 3. In that figure, three synthetic point cloud examples are shown to the left, their persistence diagrams are shown in the middle---which are nearly identical for the three point clouds, and their decorated merge trees are shown on the right. Variations on this example are shown in Figure 4. The PI and their collaborators were able to define several distances on these DMTs of varying computability and have demonstrated their improved clustering power. Figure 5 shows how the six data sets from Figure 4 would be clustered using traditional methods, which are most clearly grouped using our new method. These methods also provide feature localization when studying time series, such as heart rate data (Figure 6), and in image data, such as in fMRI data.

Several graduate students received support from this grant to augment their mentorship as highly skilled mathematicians and programmers. At least one student has written a PhD thesis based on the work performed on this project and several postdocs at other institutions were able to augment their training through collaboration with the PI and their graduate students. The PI continues to teach data science courses at their host institution where these methods are brought into the classroom so that undergraduate and masters students are exposed to the latest methods in TDA.


Last Modified: 10/27/2021
Modified by: Justin M Curry

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page