
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 19, 2015 |
Latest Amendment Date: | August 19, 2015 |
Award Number: | 1551338 |
Award Instrument: | Standard Grant |
Program Manager: |
Jie Yang
jyang@nsf.gov (703)292-4768 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2015 |
End Date: | August 31, 2017 (Estimated) |
Total Intended Award Amount: | $139,849.00 |
Total Awarded Amount to Date: | $139,849.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
3141 CHESTNUT ST PHILADELPHIA PA US 19104-2875 (215)895-6342 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
3201 Arch Street Philadelphia PA US 19104-2875 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Robust Intelligence |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The notion of similarity plays a key role in modern machine learning and artificial intelligence (AI) in general, since it serves as an organizing principle by which algorithms classify objects, form concepts, and make generalizations. While similarity assessment has been widely studied, the important special case of assessing similarity in domains where the data of interest is structured has not received sufficient attention. These structured representations, however, play a key role in many domains, such as biomedicine, where data of interest naturally lends itself to structured representations. The research performed in this project aims at filling the gap in structural similarity knowledge by creating a novel generalized framework for similarity assessment. To achieve the creation of this framework the PIs will focus on the specific biomedical application of immune cell populations and their dynamics during development and in response to disease. By focusing on this specific domain, the performed research will evaluate the new approach in a real-world setting, while leading to significant contributions to the understanding of immune dynamics.
The key concepts that will be developed in this research project are refinement operators and metric embedding. The key insight of the proposed work is that refinement operators can be used to define similarity measures, and to abstract away from the underlying representation formalism. This will lead to a new framework for similarity assessment that is applicable to a broad range of representation formalisms. Moreover, we propose to use metric embedding techniques to provide computationally efficient numerical approximations to the resulting similarity measures. The definition of general and tractable similarity measures, applicable to a range of structured representations, will be a significant contribution to structured machine learning and AI. The research team will use data collected from high throughput sequencing experiments, and evaluate the generality and performance of the proposed similarity measures by using them to analyze how repertoires of immune cell populations can be described and compared by their clonotypes (sets of cells with the same progenitor cell). The results from applying similarity measures to this problem will help us start to construct a comprehensive view of the impact of clonotype and whole repertoire information on our understanding of the dynamics of immune responses in general.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Structured machine learning (SML) is a promising growing field in artificial intelligence (AI), which deals with application domains that feature rich and structured data. SML has numerous applications to chemical and biomedical domains, computer vision or natural language processing. A key notion in SML is that of “similarity”, since “similarity” serves as an organizing principle by which individuals and computational methods classify objects, form concepts of make generalizations. The key problem that this project has addressed is how to define similarity measures for SML in a general and domain and representation-agnostic way.
This project has studied two key concepts in order to address this objective: refinement operators and metric embedding. Refinement operators were introduced several decades ago in the context of logic programming. Our key insight is that they can also be used to define similarity measures, and to abstract away from the underlying domain and representation. One drawback of refinement operators, however, is that methods based on them typically have a high computational cost, making them impractical for domains with large volumes of data. We have used the idea of metric embedding to define numerical approximations to these similarity measures that are computationally efficient without a significant decrease in machine learning performance.
In order to motivate and evaluate our research, this project has used one of the basic open problems in immunology as the application domain: what is the difference between immune system repertoires under different circumstances? There are presently no tools to quantitatively characterize and analyze the antibody repertoire of the human body as a whole, and without such tools we cannot predict the efficacy of vaccines or identify perturbations in the immune repertoire that lead to autoimmunity before disease onset. We have used previously collected data from high throughput sequencing experiments and used it to evaluate the proposed methods.
Additionally, this project has contributed to training of graduate students, has resulted in numerous publications, and on the creation of novel open-source machine learning tools. Our results are freely available online and have been presented in many research talks at important meetings in the field.
Last Modified: 12/14/2017
Modified by: Santiago Ontanon
Please report errors in award information by writing to: awardsearch@nsf.gov.