NSF Award Search: Award # 1845076 - CIF: CAREER: Robust, Interpretable, and Efficient Unsupervised Learning with K-set Clustering

Award Abstract # 1845076

CIF: CAREER: Robust, Interpretable, and Efficient Unsupervised Learning with K-set Clustering

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	REGENTS OF THE UNIVERSITY OF MICHIGAN
Initial Amendment Date:	February 21, 2019
Latest Amendment Date:	August 19, 2021
Award Number:	1845076
Award Instrument:	Continuing Grant
Program Manager:	James Fowler jafowler@nsf.gov (703)292-8910 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	May 1, 2019
End Date:	April 30, 2026 (Estimated)
Total Intended Award Amount:	$596,792.00
Total Awarded Amount to Date:	$620,792.00
Funds Obligated to Date:	FY 2019 = $118,738.00 FY 2020 = $259,977.00 FY 2021 = $242,077.00
History of Investigator:	Laura Balzano (Principal Investigator) girasole@umich.edu
Recipient Sponsored Research Office:	Regents of the University of Michigan - Ann Arbor 1109 GEDDES AVE STE 3300 ANN ARBOR MI US 48109-1015 (734)763-6438
Sponsor Congressional District:	06
Primary Place of Performance:	University of Michigan Ann Arbor 1301 Beal Avenue Ann Arbor MI US 48109-2122
Primary Place of Performance Congressional District:	06
Unique Entity Identifier (UEI):	GNJ7BBP73WE9
Parent UEI:
NSF Program(s):	Special Projects - CCF, Comm & Information Foundations
Primary Program Source:	01002223DB NSF RESEARCH & RELATED ACTIVIT 01002324DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT 01002122DB NSF RESEARCH & RELATED ACTIVIT 01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	1045, 7936, 7935, 9102, 9251, 9178
Program Element Code(s):	287800, 779700
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Modern machine learning techniques aim to design models and algorithms that allow computers to learn efficiently from vast amounts of previously unexplored data. These problems are called 'unsupervised' because no human-provided information about the data is available to guide the machine learning process. Arguably the two most important unsupervised machine learning tools are dimensionality-reduction and clustering. In dimensionality-reduction, the algorithm seeks a simple low-dimensional structure that captures the interesting behavior in the data. In clustering, the algorithm seeks to group data points together into meaningful clusters. As increasingly higher-dimensional data are collected about progressively more elaborate physical, biological, and social phenomena, algorithms that aim at both dimensionality reduction and clustering are often highly applicable. However, joint formulations in the literature are often ad-hoc and fundamentally unable to operate on real data that have missing elements, corruptions, and heterogeneity --- critical machine learning challenges for modern data problems. This research project is expected to have broad applicability in data science, and will be demonstrated in two applications: genetics and computer vision.

The joint clustering and dimensionality reduction formulation used in this project, called K-set clustering, seeks K "central sets" constrained to have some low-dimensional representation, each of which represents one of K clusters in the data. The formulation is a generalization of K-means, K-subspaces, and principal component analysis, and it naturally leads to several novel problem instances. Given a defined set geometry, the corresponding problem instance is approached from two perspectives: understanding the geometry of that instance of the problem formulation, and learning those geometric models from data. Three specific examples of the problem formulation will be studied: subspace clustering, variety clustering, and polyhedral set clustering. While each example presents intrinsic and unique challenges, these are just examples of a larger paradigm that is limited only by one's ability to define sets amenable to modeling the geometric structure in data. The formulation allows for interpretable data analysis, with a framework that can readily incorporate missing data and heterogeneous data.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 27)

Show All

Bower, Amanda and Balzano, Laura "Preference Modeling with Context-Dependent Salient Features" 37th International Conference on Machine Learning , 2020 https://doi.org/ Citation Details

Bower, Amanda and Balzano, Laura "Preference Modeling with Context-Dependent Salient Features" Proceedings of the 37th International Conference on Machine Learning , 2020 https://doi.org/ Citation Details

Cavazos, Javier Salazar and Fessler, Jeffrey A. and Balzano, Laura "ALPCAH: Sample-wise Heteroscedastic PCA with Tail Singular Value Regularization" International Conference on Sampling Theory and Applications , 2023 https://doi.org/10.1109/SampTA59647.2023.10301206 Citation Details

Du, Zhe and Liu, Zexiang and Weitz, Jack and Ozay, Necmiye "Sample Complexity Analysis and Self-regularization in Identification of Over-parameterized ARX Models" IEEE Conference on Decision and Control (CDC) , 2022 Citation Details

Du, Zhe and Ozay, Necmiye and Balzano, Laura "Clustering-based Mode Reduction for Markov Jump Systems" Learning for Decision and Control , 2022 Citation Details

Du, Zhe and Ozay, Necmiye and Balzano, Laura "Clustering-based Mode Reduction for Markov Jump Systems" Proceedings of Machine Learning Research, 4th Annual Conference on Learning for Dynamics and Control , v.168 , 2022 Citation Details

Du, Zhe and Ozay, Necmiye and Balzano, Laura "Mode Clustering for Markov Jump Systems" IEEE CAMSAP 2019 , 2019 Citation Details

Du, Zhe and Sattar, Yahya and Tarzanagh, Davoud Ataee and Balzano, Laura and Ozay, Necmiye and Oymak, Samet "Data-Driven Control of Markov Jump Systems: Sample Complexity and Regret Bounds" 2022 American Control Conference , 2022 https://doi.org/10.23919/ACC53348.2022.9867863 Citation Details

Geelen, Rudy and Balzano, Laura and Willcox, Karen "Learning Latent Representations in High-Dimensional State Spaces Using Polynomial Manifold Constructions" 62nd IEEE Conference on Decision and Control (CDC) , 2023 https://doi.org/10.1109/CDC49753.2023.10384209 Citation Details

Geelen, Rudy and Balzano, Laura and Wright, Stephen and Willcox, Karen "Learning physics-based reduced-order models from data using nonlinear manifolds" Chaos: An Interdisciplinary Journal of Nonlinear Science , v.34 , 2024 https://doi.org/10.1063/5.0170105 Citation Details

Gilman, Kyle and Ataee Tarzanagh, Davoud and Balzano, Laura "Grassmannian Optimization for Online Tensor Completion and Tracking with the t-SVD" IEEE Transactions on Signal Processing , 2022 https://doi.org/10.1109/TSP.2022.3164837 Citation Details

(Showing: 1 - 10 of 27)

Show All

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error