Skip to feedback

Award Abstract # 1940759
AitF: Collaborative Research: Fast, Accurate, and Practical: Adaptive Sublinear Algorithms for Scalable Visualization

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: REGENTS OF THE UNIVERSITY OF CALIFORNIA, THE
Initial Amendment Date: July 30, 2019
Latest Amendment Date: July 30, 2019
Award Number: 1940759
Award Instrument: Standard Grant
Program Manager: Tracy Kimbrel
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: May 15, 2019
End Date: August 31, 2021 (Estimated)
Total Intended Award Amount: $208,678.00
Total Awarded Amount to Date: $208,678.00
Funds Obligated to Date: FY 2017 = $208,677.00
History of Investigator:
  • Aditya Parameswaran (Principal Investigator)
    adityagp@berkeley.edu
Recipient Sponsored Research Office: University of California-Berkeley
1608 4TH ST STE 201
BERKELEY
CA  US  94710-1749
(510)643-3891
Sponsor Congressional District: 12
Primary Place of Performance: University of California-Berkeley
102 South Hall
Berkeley
CA  US  94720-4600
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): GS3YEVSS12N6
Parent UEI:
NSF Program(s): Algorithms in the Field
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 723900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

With the wealth of data being generated in every sphere of human endeavor, data exploration--analyzing, understanding, and extracting value from data--has become absolutely vital. Data visualization is by far the most common data exploration mechanism, used by novice and expert data analysts alike. Yet data visualization on increasingly larger datasets remains difficult: even simple visualizations of a large dataset can be slow and non-interactive, while visualizations of a sampled fraction of a dataset can mislead an analyst.

The project aims to develop FastViz, a scalable visualization engine, that will not only enable visualization on datasets that are orders of magnitude larger in the same time, but also ensure the resulting visualizations satisfy key properties essential for correct analysis by end-users. To ensure immediate utilization, FastViz will be applied to three real-world application domains: battery science, advertising analysis, and genomic data analysis, and implemented in Zenvisage, an open-source visual exploration platform developed by the PIs. Students in the project gain invaluable experience in combining the algorithmic and systems considerations that enable data exploration.

FastViz's development is driven by simultaneous investigation of systems considerations, such as indexing and storage techniques that enable various forms of online sampling, and algorithmic considerations for
(a) visualization generation, where the goal is to produce incrementally improving visualizations in which the important features are displayed first, and
(b) visualization selection, where the goal is to select, from a collection of as yet not generated visualizations, those that satisfy desired criteria.
On the systems front, FastViz will leverage and contribute back to recent developments on online sampling systems that enable the use of more powerful sampling modalities.
On the algorithms front, FastViz will draw ideas from testing, distribution learning, and sublinear algorithms literature that, to the best knowledge of the PIs, have not been adapted in practice. The algorithms developed will obey optimality guarantees, and wherever possible, instance-optimality guarantees, ensuring that they will adapt to data characteristics in the most efficient way possible.

The project will lead to a better understanding of the interplay between sampling algorithms development and systems design, facilitating the adoption of more realistic models and algorithms on the one hand, and the development of more powerful sampling engines that enable the models required within the algorithms.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 12)
Bendre, Mangesh and Wattanawaroon, Tana and Mack, Kelly and Chang, Kevin and Parameswaran, Aditya "Anti-Freeze for Large and Complex Spreadsheets: Asynchronous Formula Computation" Proceedings of the 2019 International Conference on Management of Data, {SIGMOD} Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019. , 2019 10.1145/3299869.3319876 Citation Details
Bendre, Mangesh and Wattanawaroon, Tana and Rahman, Sajjadur and Mack, Kelly and Liu, Yuyang and Zhu, Shichu and Lu, Yu and Yang, Ping-Jing and Zhou, Xinyan and Chang, Kevin Chen-Chuan and Karahalios, Karrie and Parameswaran, Aditya "Faster, Higher, Stronger: Redesigning Spreadsheets for Scale" 35th IEEE International Conference on Data Engineering, ICDE 2019 , 2019 10.1109/ICDE.2019.00217 Citation Details
Devin Petersohn, Dixin Tang "Flexible Rule-Based Decomposition and Metadata Independence in Modin: A Parallel Dataframe System" Proceedings of the VLDB Endowment , 2022 Citation Details
Doris Jung-Lin Lee, Dixin Tang "Lux: Always-on Visualization Recommendations for Exploratory Data Science" Proceedings of the VLDB Endowment , 2022 Citation Details
Doris Jung-Lin Lee, Vidya Setlur "Deconstructing Categorization in Visualization Recommendation: A Taxonomy and Comparative Study" Visualization , 2021 Citation Details
Lee, Doris Jung-Lin and Dev, Himel and Hu, Huizi and Elmeleegy, Hazem and Parameswaran, Aditya "Avoiding drill-down fallacies with VisPilot: assisted exploration of data subsets" Proceedings of the 24th International Conference on Intelligent User Interfaces, {IUI} 2019, Marina del Ray, CA, USA, March 17-20, 2019 , 2019 10.1145/3301275.3302307 Citation Details
Lee, Doris Jung-Lin and Lee, John and Siddiqui, Tarique and Kim, Jaewoo and Karahalios, Karrie and Parameswaran, Aditya "You can't always sketch what you want: Understanding Sensemaking in Visual Query Systems" IEEE Transactions on Visualization and Computer Graphics , 2019 https://doi.org/10.1109/TVCG.2019.2934666 Citation Details
Parameswaran, Aditya "Enabling data science for the majority" Proceedings of the VLDB Endowment , v.12 , 2019 10.14778/3352063.3352148 Citation Details
Petersohn, Devin and Macke, Stephen and Xin, Doris and Ma, William and Lee, Doris and Mo, Xiangxi and Gonzalez, Joseph E. and Hellerstein, Joseph M. and Joseph, Anthony D. and Parameswaran, Aditya "Towards scalable dataframe systems" Proceedings of the VLDB Endowment , v.13 , 2020 https://doi.org/10.14778/3407790.3407807 Citation Details
Siddiqui, Tarique and Luh, Paul and Wang, Zesheng and Karahalios, Karrie and Parameswaran, Aditya "ShapeSearch: A Flexible and Efficient System for Shape-based Exploration of Trendlines" SIGMOD 2020 , 2020 https://doi.org/10.1145/3318464.3389722 Citation Details
Stephen, Macke and Maryam, Aliakbarpour and Ilias, Diakonikolas and Aditya, Parameswaran and Ronitt, Rubinfeld "Rapid Approximate Aggregation with Distribution-Sensitive Interval Guarantees" ICDE 2021 , 2021 https://doi.org/ Citation Details
(Showing: 1 - 10 of 12)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project investigated statistical bounds that can be achieved in a data-dependent way. Provably correct bounds used in existing systems are data independent, and thus have to work even in extreme cases, such as when half of the data-points are at the min range value and half are at the max. This means that they are typically much wider than necessary, and will thus result in unnecessarily long query latencies in most cases. This project explored the development of confidence bounds that can take advantage of data characteristics for early  termination.   This project designed an algorithm for mean estimation that beats standard methods in non-worst case settings. 

The second direction explored by this project is to develop techniques for visualization search. Insight discovery in large datasets is challenging due to the sheer number of visualizations that can be generated -- making it hard to find patterns or trends. This project included the development of intelligent interfaces for supporting visualization search, spanning intuitive interactions, sketching, and natural language. These interfaces were powered by sampling algorithms for rapidly matching visualizations against the target, to return approximately correct results with guarantees.

The third direction explored by the project is to develop techniques and interfaces for visualization recommendation. For users who are exploring a dataset for the first time, it can be overwhelming and challenging to determine which visualization to generate next to advance understanding. This project introduced a number of usable and scalable visualization recommendation techniques and instantiated them in four novel visualization recommendation systems. These visualization recommendation systems were downloaded over 100,000 times and used in various domains including genomics, battery science, astrophysics, and ad analytics.

 

 

 


Last Modified: 11/01/2021
Modified by: Aditya Parameswaran

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page