
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | February 21, 2019 |
Latest Amendment Date: | August 19, 2021 |
Award Number: | 1845076 |
Award Instrument: | Continuing Grant |
Program Manager: |
James Fowler
jafowler@nsf.gov (703)292-8910 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | May 1, 2019 |
End Date: | April 30, 2026 (Estimated) |
Total Intended Award Amount: | $596,792.00 |
Total Awarded Amount to Date: | $620,792.00 |
Funds Obligated to Date: |
FY 2020 = $259,977.00 FY 2021 = $242,077.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
1109 GEDDES AVE STE 3300 ANN ARBOR MI US 48109-1015 (734)763-6438 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1301 Beal Avenue Ann Arbor MI US 48109-2122 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Special Projects - CCF, Comm & Information Foundations |
Primary Program Source: |
01002324DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT 01002122DB NSF RESEARCH & RELATED ACTIVIT 01002021DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Modern machine learning techniques aim to design models and algorithms that allow computers to learn efficiently from vast amounts of previously unexplored data. These problems are called 'unsupervised' because no human-provided information about the data is available to guide the machine learning process. Arguably the two most important unsupervised machine learning tools are dimensionality-reduction and clustering. In dimensionality-reduction, the algorithm seeks a simple low-dimensional structure that captures the interesting behavior in the data. In clustering, the algorithm seeks to group data points together into meaningful clusters. As increasingly higher-dimensional data are collected about progressively more elaborate physical, biological, and social phenomena, algorithms that aim at both dimensionality reduction and clustering are often highly applicable. However, joint formulations in the literature are often ad-hoc and fundamentally unable to operate on real data that have missing elements, corruptions, and heterogeneity --- critical machine learning challenges for modern data problems. This research project is expected to have broad applicability in data science, and will be demonstrated in two applications: genetics and computer vision.
The joint clustering and dimensionality reduction formulation used in this project, called K-set clustering, seeks K "central sets" constrained to have some low-dimensional representation, each of which represents one of K clusters in the data. The formulation is a generalization of K-means, K-subspaces, and principal component analysis, and it naturally leads to several novel problem instances. Given a defined set geometry, the corresponding problem instance is approached from two perspectives: understanding the geometry of that instance of the problem formulation, and learning those geometric models from data. Three specific examples of the problem formulation will be studied: subspace clustering, variety clustering, and polyhedral set clustering. While each example presents intrinsic and unique challenges, these are just examples of a larger paradigm that is limited only by one's ability to define sets amenable to modeling the geometric structure in data. The formulation allows for interpretable data analysis, with a framework that can readily incorporate missing data and heterogeneous data.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
Please report errors in award information by writing to: awardsearch@nsf.gov.