
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | July 25, 2014 |
Latest Amendment Date: | April 25, 2017 |
Award Number: | 1422830 |
Award Instrument: | Standard Grant |
Program Manager: |
Rahul Shah
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2014 |
End Date: | January 31, 2018 (Estimated) |
Total Intended Award Amount: | $450,000.00 |
Total Awarded Amount to Date: | $450,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1960 KENNY RD COLUMBUS OH US 43210-1016 (614)688-8735 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
OH US 43210-1063 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Algorithmic Foundations |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Data analysis is ubiquitous in a broad range of application fields, from computer graphics to geographic information systems, from sensor networks to social networks, from economics to medicine. It represents a fundamental problem in computational science. The project will advance the theoretical understanding of fundamental issues behind data analysis, and develop practical algorithms that will be useful for a broad range of problems in science and engineering.
The project addresses the fundamental problem of reconstructing structure of probability distributions from sampled data. It will investigate the use of tensor-based and other higher order methods, in particular those that allow for efficient optimization. The project lies at the interface of theoretical computer science, machine learning, signal processing and statistics and will have potential impact in all of these fields. In recent years there has been a resurgence of interest in tensor methods in data analysis and inference, particularly in theoretical computer science. These methods will prove useful in a variety of applications in machine learning, signal processing and other fields.
The project will develop algorithms for solving a range of problems including blind source separation, spectral clustering, inference in mixture models and estimating geometry of distributions. It will analyze the complexity of these and related problems. In particular, it will strive to understand the computational efficiency and dependence on the dimension of the space, studying "the curses and blessings of dimensionality". It will also address a somewhat mysterious discrepancy between sample and algorithmic complexity in our understanding of many high dimensional inference problems.
The results of this work will be disseminated to the broad scientific community through publications in journals, conferences and presentations in various venues, including tutorials. The goals of this project include to implement the practical algorithms and to make the software available online. The research results will also be incorporated in the curriculum of graduate classes taught by the PI and the co-PI. Graduate students supported by this project will receive extensive training in theory, algorithm development and applications.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The amount of data that we generate and collect grows at an accelerated pace. If we want to extract useful information from this data, we need better and better tools. One source of difficulty is the growing number of attributes or dimensionality, which makes data analysis computationally expensive. This research project started with the aim of developing better tools for the analysis of high-dimensional data. The proposed method is to use geometry as a way of understanding the structure of data, namely, to answer the question: What is the shape of our data?
A central problem in data analysis is to find an insightful representation of the data. The geometry of this problem can be seen in the following analogous situation: In order to understand the shape of a three-dimensional object by looking at it from a particular angle, we may need to choose a good orientation of the object. In real-world data the number of attributes is commonly many more than three. The representation problem tends to become more difficult and relevant as the number of attributes grows. One of the main outcomes of this project is a unified framework to discover a good representation of a dataset. Among the representations considered are representations where the attributes behave independently (known as Independent Component Analysis). We provided and analyzed simple, yet general methods that have strong theoretical guarantees even when the data is corrupted by noise. We have also developed practical algorithms for the important problem of clustering, i.e., grouping objects together based on the similarities between their attributes.
Last Modified: 06/11/2018
Modified by: Mikhail Belkin
Please report errors in award information by writing to: awardsearch@nsf.gov.