
NSF Org: |
DMS Division Of Mathematical Sciences |
Recipient: |
|
Initial Amendment Date: | September 5, 2016 |
Latest Amendment Date: | September 5, 2016 |
Award Number: | 1620271 |
Award Instrument: | Standard Grant |
Program Manager: |
Leland Jameson
DMS Division Of Mathematical Sciences MPS Directorate for Mathematical and Physical Sciences |
Start Date: | September 1, 2016 |
End Date: | August 31, 2020 (Estimated) |
Total Intended Award Amount: | $200,000.00 |
Total Awarded Amount to Date: | $200,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
2550 NORTHWESTERN AVE # 1100 WEST LAFAYETTE IN US 47906-1332 (765)494-1055 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
150 N University St West Lafayette IN US 47907-2067 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | COMPUTATIONAL MATHEMATICS |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.049 |
ABSTRACT
The goal of the project is to apply methods of logic and topology to several important problems in genomics and medicine. The first application is mining large scale clinical databases for information that would be usable for clinicians and/or biological researchers. The PI plans to design and implement a method applying ideas from persistent homology theory in a logic-based framework. The starting point of this approach is the notion of "redescriptions" introduced by Parida (senior consultant for the project) and Ramakrishnan in the context of knowledge discovery. A mathematical reformulation leads to certain filtered complexes arising from set systems, which are then amenable to analysis using tools from topology. A second application will be in the area of phylogenetics. Studying population admixtures is a very active area of research in population genomics. The PI will use topological methods to not only detect but also discriminate ancient from recent admixture, and validate the approach by testing it on large simulated populations where the admixtures are known in advance. Generating such populations pose unique challenges that have been tackled by Parida and her group recently.
A common practical difficulty encountered in many applications of topological data analysis is computing persistent homology groups of filtrations of very large simplicial complexes. The sizes of these complexes makes the computations of their persistent homology bar-codes using current generation publicly available software impossible. The second part of the project will address this shortcoming. The PI will investigate a new approach towards improving efficiency of computing persistent homology groups over existing algorithms. This approach will be useful in a wide variety of applications where topological data analysis is currently being used. The PI plans to implement this algorithm and develop it into a general purpose software-package for computing approximations of persistent homology invariants of filtrations of large complexes. The project will bring together tools from two different areas of mathematics -- logic and topology -- in a novel way, as a method towards analyzing large data-sets. In addition, the PI will also study the underlying mathematical problems that come up -- on the interface of logic and topology which are fundamentally interesting in their own rights, and should have other applications as well. The PI also intends to work with a graduate student and involve them in all aspects of the proposed research.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
There were several research outcomes from the research undertaken. The theoretical outcomes included the following.
The topological complexity of Reeb spaces of semi-algebraic maps was studied (joint work with graduate students N. Cox and S. Percival) and an singly exponential upper bound was obtained. Reeb spaces of maps are an important tool in applied topology with several applications such as in topological simplification. An algorithm with singly exponential complexity was developed for computing simplicial replacement of a given semi-algebraic set having the same homotopy type up to a fixed dimension. As an important applications in an algorithm with singly exponential complexity was given for computing ''bar codes'' of filtrations of semi-algebraic sets (joint work with graduate student N. Karisani). An algorithm with polynomially bounded complexity was developed for computing the Betti numbers of symmetric semi-algebraic sets defined by symmetric polynomials of degrees bounded by a constant (joint work with C. Riener).
On the practical side (in collaboration with the Computational Genomics Group at IBM Research led by Dr. Laxmi Parida), it was shown that topological methods using the theory of persistent homology can be used effectively in analyzing large scale genomic data of various types -- and often outperforms traditional methods. These applications included characterizing redescriptions using persistent homology to isolate genetic pathways contributing to pathogenesis (joint work with D. Platt, A. Zalloua, and L. Parida), subtle admixture detection in population genomics (joint work with F. Utro and L. Parida), a topological data analysis approach to predicting phenotypes from gene expression data (joint work with S. Mandal, A Guzman-Saenz, N. Haiminen and L. Parida), and Inferring COVID-19 biological pathways from clinical phenotypes via topological analysis (joint work with graduate student N. Karisani, D. Platt and L. Parida).
Several new ideas were introduced -- from logic as well as topology -- in the course of studying these applications, and the software/pipeline developed may be reused in the future on other applications in the same or different domains.
Three Phd students (two from Mathematics and one from Computer Science) were trained in the different aspects of applied topology and its interface with algorithmic semi-algebraic geometry.
Last Modified: 02/09/2021
Modified by: Saugata Basu
Please report errors in award information by writing to: awardsearch@nsf.gov.