Award Abstract # 1620271
Logic, Topology and Genomics

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: PURDUE UNIVERSITY
Initial Amendment Date: September 5, 2016
Latest Amendment Date: September 5, 2016
Award Number: 1620271
Award Instrument: Standard Grant
Program Manager: Leland Jameson
DMS
 Division Of Mathematical Sciences
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: September 1, 2016
End Date: August 31, 2020 (Estimated)
Total Intended Award Amount: $200,000.00
Total Awarded Amount to Date: $200,000.00
Funds Obligated to Date: FY 2016 = $200,000.00
History of Investigator:
  • Saugata Basu (Principal Investigator)
Recipient Sponsored Research Office: Purdue University
2550 NORTHWESTERN AVE # 1100
WEST LAFAYETTE
IN  US  47906-1332
(765)494-1055
Sponsor Congressional District: 04
Primary Place of Performance: Purdue University
150 N University St
West Lafayette
IN  US  47907-2067
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): YRXVL4JYCEF5
Parent UEI: YRXVL4JYCEF5
NSF Program(s): COMPUTATIONAL MATHEMATICS
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 9263
Program Element Code(s): 127100
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

The goal of the project is to apply methods of logic and topology to several important problems in genomics and medicine. The first application is mining large scale clinical databases for information that would be usable for clinicians and/or biological researchers. The PI plans to design and implement a method applying ideas from persistent homology theory in a logic-based framework. The starting point of this approach is the notion of "redescriptions" introduced by Parida (senior consultant for the project) and Ramakrishnan in the context of knowledge discovery. A mathematical reformulation leads to certain filtered complexes arising from set systems, which are then amenable to analysis using tools from topology. A second application will be in the area of phylogenetics. Studying population admixtures is a very active area of research in population genomics. The PI will use topological methods to not only detect but also discriminate ancient from recent admixture, and validate the approach by testing it on large simulated populations where the admixtures are known in advance. Generating such populations pose unique challenges that have been tackled by Parida and her group recently.

A common practical difficulty encountered in many applications of topological data analysis is computing persistent homology groups of filtrations of very large simplicial complexes. The sizes of these complexes makes the computations of their persistent homology bar-codes using current generation publicly available software impossible. The second part of the project will address this shortcoming. The PI will investigate a new approach towards improving efficiency of computing persistent homology groups over existing algorithms. This approach will be useful in a wide variety of applications where topological data analysis is currently being used. The PI plans to implement this algorithm and develop it into a general purpose software-package for computing approximations of persistent homology invariants of filtrations of large complexes. The project will bring together tools from two different areas of mathematics -- logic and topology -- in a novel way, as a method towards analyzing large data-sets. In addition, the PI will also study the underlying mathematical problems that come up -- on the interface of logic and topology which are fundamentally interesting in their own rights, and should have other applications as well. The PI also intends to work with a graduate student and involve them in all aspects of the proposed research.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Aldo Guzman Saenz, Niina Haiminen, Saugata Basu, Laxmi Parida "Signal Enrichment in Microbiomes with Topological Data Analysis" BMC, Genomics , 2019 10.1186/s12864-019-5490-y
Basu, Saugata and Lerario, Antonio and Natarajan, Abhiram "Zeroes of polynomials on definable hypersurfaces: pathologies exist, but they are rare" The Quarterly Journal of Mathematics , v.70 , 2019 https://doi.org/10.1093/qmath/haz022 Citation Details
Saugata Basu and Anthony Rizzie "Multi-degree bounds on the Betti numbers of real varieties and semi-algebraic sets and applications" Discrete and Computational Geometry , v.59 , 2018 , p.553
Saugata Basu and Antonio Lerario and Abhiram Natarajan "Zeroes of polynomials on definable hypersurfaces: pathologies exist, but they are rare" Quarterly Journal of Mathematics , v.70 , 2019 , p.1397
Saugata Basu and Orit Raz "An o-minimal Szemeredi-Trotter Theorem" Quarterly Journal of Mathematics , 2017 doi.org/10.1093/qmath/hax037
Saugata Basu, Filippo Utro, Laxmi Parida "Essential Simplices in Persistent Homology and Subtle Admixture Detection" 18th International Workshop on Algorithms in Bioinformatics (WABI 2018) , 2018 978-3-95977-082-8
Saugata Basu, F. Utro, L. Parida "Essential Simplices in Persistent Homology and Subtle Admixture Detection" 18th International Workshop on Algorithms in Bioinformatics (WABI 2018) , 2018 , p.14:1
Sayan Mandal, Aldo Guzmán-Sáenz, Niina Haiminen, Saugata Basu and Laxmi Parida "A Topological Data Analysis Approach on Predicting Phenotypes from Gene Expression Data" 7th International Conference on Algorithms for Computational Biology(AlCoB 2020)Missoula, Montana, USA - April 13-15, 2020 , 2020 10.1007/978-3-030-42266-0_14

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

There were several research outcomes from the research undertaken. The theoretical outcomes included the following.  

The topological complexity of Reeb spaces of semi-algebraic maps was studied (joint work with graduate students N. Cox and S. Percival) and an singly exponential upper bound was obtained. Reeb spaces of maps are an important tool in applied topology with several applications such as in topological simplification. An algorithm with singly exponential complexity was developed for computing simplicial replacement of a given semi-algebraic set having the same homotopy type up to a fixed dimension.   As an important applications in an algorithm with singly exponential complexity was given for computing ''bar codes'' of filtrations of semi-algebraic sets (joint work with graduate student N. Karisani). An algorithm with polynomially bounded complexity was developed for computing the Betti numbers of symmetric semi-algebraic sets defined by symmetric polynomials of degrees bounded by a constant (joint work with C. Riener).

On the practical side (in collaboration with the Computational Genomics Group at IBM Research led by Dr. Laxmi Parida), it was shown that topological methods using the theory of persistent homology can be used effectively in analyzing large scale genomic data of various types -- and often outperforms traditional methods. These applications included characterizing redescriptions using persistent homology to isolate genetic pathways contributing to pathogenesis (joint work with D. Platt,   A. Zalloua, and L. Parida), subtle admixture detection in population genomics (joint work with F. Utro and L. Parida), a topological data analysis approach to predicting phenotypes from gene expression data (joint work with S. Mandal, A Guzman-Saenz, N. Haiminen and L. Parida), and Inferring COVID-19 biological pathways from clinical phenotypes via topological analysis (joint work with graduate student N. Karisani, D. Platt and L. Parida).

Several new ideas were introduced -- from logic as well as topology -- in the course of studying these applications, and the software/pipeline developed may be reused in the future on other applications in the same or different domains. 

Three Phd students (two from Mathematics and one from Computer Science) were trained in the different aspects  of applied topology and its interface with algorithmic semi-algebraic geometry.   


Last Modified: 02/09/2021
Modified by: Saugata Basu

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page