NSF Award Search: Award # 1551338 - EAGER: Similarity Measures Based on Refinement Operators and Metric Embedding Applied to the Analysis of Immune Repertoires

Award Abstract # 1551338

EAGER: Similarity Measures Based on Refinement Operators and Metric Embedding Applied to the Analysis of Immune Repertoires

NSF Org:	IIS Division of Information & Intelligent Systems
Recipient:	DREXEL UNIVERSITY
Initial Amendment Date:	August 19, 2015
Latest Amendment Date:	August 19, 2015
Award Number:	1551338
Award Instrument:	Standard Grant
Program Manager:	Jie Yang jyang@nsf.gov (703)292-4768 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	September 1, 2015
End Date:	August 31, 2017 (Estimated)
Total Intended Award Amount:	$139,849.00
Total Awarded Amount to Date:	$139,849.00
Funds Obligated to Date:	FY 2015 = $139,849.00
History of Investigator:	Santiago Ontanon (Principal Investigator) santi@cs.drexel.edu Ali Shokoufandeh (Co-Principal Investigator) Uri Hershberg (Co-Principal Investigator)
Recipient Sponsored Research Office:	Drexel University 3141 CHESTNUT ST PHILADELPHIA PA US 19104-2875 (215)895-6342
Sponsor Congressional District:	03
Primary Place of Performance:	Drexel University 3201 Arch Street Philadelphia PA US 19104-2875
Primary Place of Performance Congressional District:	03
Unique Entity Identifier (UEI):	XF3XM9642N96
Parent UEI:
NSF Program(s):	Robust Intelligence
Primary Program Source:	01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7495, 7916
Program Element Code(s):	749500
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

The notion of similarity plays a key role in modern machine learning and artificial intelligence (AI) in general, since it serves as an organizing principle by which algorithms classify objects, form concepts, and make generalizations. While similarity assessment has been widely studied, the important special case of assessing similarity in domains where the data of interest is structured has not received sufficient attention. These structured representations, however, play a key role in many domains, such as biomedicine, where data of interest naturally lends itself to structured representations. The research performed in this project aims at filling the gap in structural similarity knowledge by creating a novel generalized framework for similarity assessment. To achieve the creation of this framework the PIs will focus on the specific biomedical application of immune cell populations and their dynamics during development and in response to disease. By focusing on this specific domain, the performed research will evaluate the new approach in a real-world setting, while leading to significant contributions to the understanding of immune dynamics.

The key concepts that will be developed in this research project are refinement operators and metric embedding. The key insight of the proposed work is that refinement operators can be used to define similarity measures, and to abstract away from the underlying representation formalism. This will lead to a new framework for similarity assessment that is applicable to a broad range of representation formalisms. Moreover, we propose to use metric embedding techniques to provide computationally efficient numerical approximations to the resulting similarity measures. The definition of general and tractable similarity measures, applicable to a range of structured representations, will be a significant contribution to structured machine learning and AI. The research team will use data collected from high throughput sequencing experiments, and evaluate the generality and performance of the proposed similarity measures by using them to analyze how repertoires of immune cell populations can be described and compared by their clonotypes (sets of cells with the same progenitor cell). The results from applying similarity measures to this problem will help us start to construct a comprehensive view of the impact of clonotype and whole repertoire information on our understanding of the dynamics of immune responses in general.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Gregory W. Schwartz; Ali Shokoufandeh; Santiago Ontanon; Uri Hershberg "Using a novel clumpiness measure to unite data with metadata: finding common sequence patterns in immune receptor germline V genes" Pattern Recognition Letters , v.74 , 2016 , p.24

Santiago Ontañón, Ali Shokoufandeh "Refinement-based Similarity Measures for Directed Labeled Graphs" International Conference on Case-based Reasoning (ICCBR 2016) , 2016

Yusuf Osmanlioglu, Santiago Ontanon, Uri Hershberg, Ali Shokoufandeh "Efficient Approximation of Labeling Problems with Applications to Immune Repertoire Analysis" International Conference on Pattern Recognition, ICPR 2016 , 2016

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Structured machine learning (SML) is a promising growing field in artificial intelligence (AI), which deals with application domains that feature rich and structured data. SML has numerous applications to chemical and biomedical domains, computer vision or natural language processing. A key notion in SML is that of “similarity”, since “similarity” serves as an organizing principle by which individuals and computational methods classify objects, form concepts of make generalizations. The key problem that this project has addressed is how to define similarity measures for SML in a general and domain and representation-agnostic way.

This project has studied two key concepts in order to address this objective: refinement operators and metric embedding. Refinement operators were introduced several decades ago in the context of logic programming. Our key insight is that they can also be used to define similarity measures, and to abstract away from the underlying domain and representation. One drawback of refinement operators, however, is that methods based on them typically have a high computational cost, making them impractical for domains with large volumes of data. We have used the idea of metric embedding to define numerical approximations to these similarity measures that are computationally efficient without a significant decrease in machine learning performance.

In order to motivate and evaluate our research, this project has used one of the basic open problems in immunology as the application domain: what is the difference between immune system repertoires under different circumstances? There are presently no tools to quantitatively characterize and analyze the antibody repertoire of the human body as a whole, and without such tools we cannot predict the efficacy of vaccines or identify perturbations in the immune repertoire that lead to autoimmunity before disease onset. We have used previously collected data from high throughput sequencing experiments and used it to evaluate the proposed methods.

Additionally, this project has contributed to training of graduate students, has resulted in numerous publications, and on the creation of novel open-source machine learning tools. Our results are freely available online and have been presented in many research talks at important meetings in the field.

Last Modified: 12/14/2017
Modified by: Santiago Ontanon

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error