
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | July 30, 2019 |
Latest Amendment Date: | July 30, 2019 |
Award Number: | 1940759 |
Award Instrument: | Standard Grant |
Program Manager: |
Tracy Kimbrel
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | May 15, 2019 |
End Date: | August 31, 2021 (Estimated) |
Total Intended Award Amount: | $208,678.00 |
Total Awarded Amount to Date: | $208,678.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1608 4TH ST STE 201 BERKELEY CA US 94710-1749 (510)643-3891 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
102 South Hall Berkeley CA US 94720-4600 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Algorithms in the Field |
Primary Program Source: |
|
Program Reference Code(s): | |
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
With the wealth of data being generated in every sphere of human endeavor, data exploration--analyzing, understanding, and extracting value from data--has become absolutely vital. Data visualization is by far the most common data exploration mechanism, used by novice and expert data analysts alike. Yet data visualization on increasingly larger datasets remains difficult: even simple visualizations of a large dataset can be slow and non-interactive, while visualizations of a sampled fraction of a dataset can mislead an analyst.
The project aims to develop FastViz, a scalable visualization engine, that will not only enable visualization on datasets that are orders of magnitude larger in the same time, but also ensure the resulting visualizations satisfy key properties essential for correct analysis by end-users. To ensure immediate utilization, FastViz will be applied to three real-world application domains: battery science, advertising analysis, and genomic data analysis, and implemented in Zenvisage, an open-source visual exploration platform developed by the PIs. Students in the project gain invaluable experience in combining the algorithmic and systems considerations that enable data exploration.
FastViz's development is driven by simultaneous investigation of systems considerations, such as indexing and storage techniques that enable various forms of online sampling, and algorithmic considerations for
(a) visualization generation, where the goal is to produce incrementally improving visualizations in which the important features are displayed first, and
(b) visualization selection, where the goal is to select, from a collection of as yet not generated visualizations, those that satisfy desired criteria.
On the systems front, FastViz will leverage and contribute back to recent developments on online sampling systems that enable the use of more powerful sampling modalities.
On the algorithms front, FastViz will draw ideas from testing, distribution learning, and sublinear algorithms literature that, to the best knowledge of the PIs, have not been adapted in practice. The algorithms developed will obey optimality guarantees, and wherever possible, instance-optimality guarantees, ensuring that they will adapt to data characteristics in the most efficient way possible.
The project will lead to a better understanding of the interplay between sampling algorithms development and systems design, facilitating the adoption of more realistic models and algorithms on the one hand, and the development of more powerful sampling engines that enable the models required within the algorithms.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project investigated statistical bounds that can be achieved in a data-dependent way. Provably correct bounds used in existing systems are data independent, and thus have to work even in extreme cases, such as when half of the data-points are at the min range value and half are at the max. This means that they are typically much wider than necessary, and will thus result in unnecessarily long query latencies in most cases. This project explored the development of confidence bounds that can take advantage of data characteristics for early termination. This project designed an algorithm for mean estimation that beats standard methods in non-worst case settings.
The second direction explored by this project is to develop techniques for visualization search. Insight discovery in large datasets is challenging due to the sheer number of visualizations that can be generated -- making it hard to find patterns or trends. This project included the development of intelligent interfaces for supporting visualization search, spanning intuitive interactions, sketching, and natural language. These interfaces were powered by sampling algorithms for rapidly matching visualizations against the target, to return approximately correct results with guarantees.
The third direction explored by the project is to develop techniques and interfaces for visualization recommendation. For users who are exploring a dataset for the first time, it can be overwhelming and challenging to determine which visualization to generate next to advance understanding. This project introduced a number of usable and scalable visualization recommendation techniques and instantiated them in four novel visualization recommendation systems. These visualization recommendation systems were downloaded over 100,000 times and used in various domains including genomics, battery science, astrophysics, and ad analytics.
Last Modified: 11/01/2021
Modified by: Aditya Parameswaran
Please report errors in award information by writing to: awardsearch@nsf.gov.