
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | August 30, 2016 |
Latest Amendment Date: | June 29, 2018 |
Award Number: | 1657939 |
Award Instrument: | Continuing Grant |
Program Manager: |
A. Funda Ergun
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | June 1, 2016 |
End Date: | June 30, 2021 (Estimated) |
Total Intended Award Amount: | $329,094.00 |
Total Awarded Amount to Date: | $329,094.00 |
Funds Obligated to Date: |
FY 2016 = $92,308.00 FY 2017 = $95,302.00 FY 2018 = $98,418.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
1850 RESEARCH PARK DR STE 300 DAVIS CA US 95618-6153 (530)754-7700 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
CA US 95618-6134 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Algorithmic Foundations |
Primary Program Source: |
01001617DB NSF RESEARCH & RELATED ACTIVIT 01001718DB NSF RESEARCH & RELATED ACTIVIT 01001819DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Analysis and exploration of data, including classification, inference, and retrieval, are ubiquitous tasks in science and applied fields. Given any such task, a fundamental paradigm is the extraction of features that are relevant. In the design of algorithms for the analysis and exploration of data, feature extraction techniques act as basic building blocks or primitives that can be combined to model complex behavior. Some of the fundamental feature extraction tools include Principal Component Analysis (PCA), Independent Component Analysis (ICA), and half-space-based learning and classification. Data rarely satisfy the precise assumptions of these models and feature extraction tools, and combining these tools amplifies errors. This motivates the challenging task of designing new algorithms that are robust against noise and that can be combined as building blocks while keeping the error propagation under control.
The proposed work will:
(1) Raise ICA from a very successful practical tool to an algorithmic primitive with strong theoretical guarantees and applicability to a rich family of problems beyond independence.
(2) Find reasonable assumptions and algorithms that allow efficient learning of intersections of half-spaces.
(3) Systematically study the following well-motivated refinement of PCA known as the subset selection problem. This refinement aims to select relevant features among the given features of the input data, unlike PCA, which creates new and possibly artificial features.
New feature extraction algorithms enhance the toolbox available to researchers in data-intensive fields such as biology, signal processing and computer vision. They also enable improved data analysis by practitioners in security, marketing, business and government processes, and essentially any field that involves the analysis of feature-rich data. The proposed work includes the implementation of the more practical algorithms.
Education and outreach aspects of this project include the mentoring of young researchers, the design of a new course for graduate and undergraduate students incorporating some of the PI's research, and the involvement of pre-college students and local communities into science and research.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
One can think of a dataset as a set of data points where each point has a number of explicit features. One aspect of the understanding of a given dataset is the identification of relevant latent features computed from the explicitly given features. The project improved this aspect in several ways, by designing methods that are guaranteed to work in broader situations. One of these situations is the case when the dataset has heavy tails, that is, the case where extreme values are not particularly rare. This is a case where many traditional methods have no guarantees. A second situation is the underdetermined case, that is, when the number of latent features is larger than the number of explicit features.
The proposed methods established new connections between data analysis and high-dimensional geometry. In the context of this connection, the Principal Investigator organized a workshop on high-dimensional geometry for the American Mathematical Society, which included some of the main experts in the subject.
All the proposed methods have strong mathematical guarantees and several of them led to practical implementations, including algorithms for Independent Component Analysis, a standard feature extraction technique, when the data is corrupted by noise and when the data has heavy tails.
Last Modified: 10/29/2021
Modified by: Luis Rademacher
Please report errors in award information by writing to: awardsearch@nsf.gov.