Award Abstract # 1657939
CAREER: Transforming data analysis via new algorithms for feature extraction

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: UNIVERSITY OF CALIFORNIA, DAVIS
Initial Amendment Date: August 30, 2016
Latest Amendment Date: June 29, 2018
Award Number: 1657939
Award Instrument: Continuing Grant
Program Manager: A. Funda Ergun
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: June 1, 2016
End Date: June 30, 2021 (Estimated)
Total Intended Award Amount: $329,094.00
Total Awarded Amount to Date: $329,094.00
Funds Obligated to Date: FY 2015 = $43,066.00
FY 2016 = $92,308.00

FY 2017 = $95,302.00

FY 2018 = $98,418.00
History of Investigator:
  • Luis Rademacher (Principal Investigator)
    lrademac@ucdavis.edu
Recipient Sponsored Research Office: University of California-Davis
1850 RESEARCH PARK DR STE 300
DAVIS
CA  US  95618-6153
(530)754-7700
Sponsor Congressional District: 04
Primary Place of Performance: University of California-Davis
CA  US  95618-6134
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): TX2DAGQPENZ5
Parent UEI:
NSF Program(s): Algorithmic Foundations
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
01001617DB NSF RESEARCH & RELATED ACTIVIT

01001718DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 7926
Program Element Code(s): 779600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Analysis and exploration of data, including classification, inference, and retrieval, are ubiquitous tasks in science and applied fields. Given any such task, a fundamental paradigm is the extraction of features that are relevant. In the design of algorithms for the analysis and exploration of data, feature extraction techniques act as basic building blocks or primitives that can be combined to model complex behavior. Some of the fundamental feature extraction tools include Principal Component Analysis (PCA), Independent Component Analysis (ICA), and half-space-based learning and classification. Data rarely satisfy the precise assumptions of these models and feature extraction tools, and combining these tools amplifies errors. This motivates the challenging task of designing new algorithms that are robust against noise and that can be combined as building blocks while keeping the error propagation under control.

The proposed work will:
(1) Raise ICA from a very successful practical tool to an algorithmic primitive with strong theoretical guarantees and applicability to a rich family of problems beyond independence.
(2) Find reasonable assumptions and algorithms that allow efficient learning of intersections of half-spaces.
(3) Systematically study the following well-motivated refinement of PCA known as the subset selection problem. This refinement aims to select relevant features among the given features of the input data, unlike PCA, which creates new and possibly artificial features.

New feature extraction algorithms enhance the toolbox available to researchers in data-intensive fields such as biology, signal processing and computer vision. They also enable improved data analysis by practitioners in security, marketing, business and government processes, and essentially any field that involves the analysis of feature-rich data. The proposed work includes the implementation of the more practical algorithms.

Education and outreach aspects of this project include the mentoring of young researchers, the design of a new course for graduate and undergraduate students incorporating some of the PI's research, and the involvement of pre-college students and local communities into science and research.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Jesus De Loera, Jamie Haddock and Luis Rademacher "The Minimum Euclidean-Norm Point on a Convex Polytope: Wolfe's Combinatorial Algorithm is Exponential" SIAM Journal on Computing , 2020 10.1137/18M1221072
Joseph Anderson and Luis Rademacher "Efficiency of the floating body as a robust measure of dispersion" SODA 2020 , 2020
Mikhail Belkin, Luis Rademacher and James Voss "Eigenvectors of Orthogonally Decomposable Functions" SIAM Journal on Computing , v.47 , 2018 , p.547 10.1137/17M1122013

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

One can think of a dataset as a set of data points where each point has a number of explicit features. One aspect of the understanding of a given dataset is the identification of relevant latent features computed from the explicitly given features. The project improved this aspect in several ways, by designing methods that are guaranteed to work in broader situations. One of these situations is the case when the dataset has heavy tails, that is, the case where extreme values are not particularly rare. This is a case where many traditional methods have no guarantees. A second situation is the underdetermined case, that is, when the number of latent features is larger than the number of explicit features.

The proposed methods established new connections between data analysis and high-dimensional geometry. In the context of this connection, the Principal Investigator organized a workshop on high-dimensional geometry for the American Mathematical Society, which included some of the main experts in the subject.

All the proposed methods have strong mathematical guarantees and several of them led to practical implementations, including algorithms for Independent Component Analysis, a standard feature extraction technique, when the data is corrupted by noise and when the data has heavy tails.


Last Modified: 10/29/2021
Modified by: Luis Rademacher

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page