
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | April 22, 2020 |
Latest Amendment Date: | April 22, 2020 |
Award Number: | 2027516 |
Award Instrument: | Standard Grant |
Program Manager: |
Hector Munoz-Avila
hmunoz@nsf.gov (703)292-4481 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | April 1, 2020 |
End Date: | September 30, 2021 (Estimated) |
Total Intended Award Amount: | $200,432.00 |
Total Awarded Amount to Date: | $200,432.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1608 4TH ST STE 201 BERKELEY CA US 94710-1749 (510)643-3891 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
CA US 94710-1749 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Big Data Science &Engineering |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
We interact with online shopping and banking websites on a daily basis. Many of these websites are powered by data-driven applications. Such application often consists of two parts: an application hosted on an application server, and a database management system (DBMS) hosted on a separate server from the application server that maintains persistent data. Unfortunately, many data-driven applications suffer from performance problems, such as taking a long time to load a page or inability to scale up to serve large number of clients simultaneously. The state of the art in discovering and fixing performance problems in data-driven applications is to examine the two parts of the application separately, and doing so misses many opportunities in discovering and fixing such problems. Unlike prior approaches, in this project we will treat the DBMS and the application in tandem. In particular, we will devise new techniques and tools to help identify performance problems, understand the cause of such problems, and fix them automatically. This project will open up new opportunities in cross-layer program compilation and optimization, with the practical goal of improving the performance of data-driven applications that will have a significant impact in many aspects of our daily lives. The findings from this project will be incorporated into undergraduate and graduate software engineering, introduction to data management, and compiler classes to be offered at the University of Chicago and the University of Washington. The outreach activities of this project will include engaging and advising students through special programs geared toward under-represented groups such as the Distributed Research Experiences for Undergraduates (DREU) organized by CRA-W (Computing Research Association -- Women) and Diversity Workshops organized by CRA-W.
Specifically, the proposed research consists of three thrusts: (1) a new cross-layer program analysis framework that produces an end-to-end profile of data-driven applications by understanding the application code, the queries that the application sends to the DBMS, and how the DBMS processes such queries; (2) a program analysis and testing framework that identify performance problems in data-driven applications by leveraging the end-to-end profile created from (1); and (3) new means to optimize data-driven applications by transforming both the application code and the queries that are issued. These three thrusts will work together to improve the performance of data-driven applications and help programmers detect performance problems during development. Software developed by this project, benchmarks used for evaluation, and performance comparison with existing techniques will be released to public domain through the project website. Further information will be available at the project website (https://people.eecs.berkeley.edu/~akcheung/coopt.html).
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project investigated techniques to identify and improve the performance of data-driven applications. Such applications are prevalent in our daily lives --- essentially all web pages are data-driven applications where data is stored persistently in databases that are manipulated and retrieved as the webpage loads.
During this project, we studied three specific aspects of this problem. First, we performed a comprehensive study of 12 representative real-world data-driven applications that are built on top of object-relational mapping (ORM) frameworks. We generalize 9 ORM performance anti-patterns from more than 200 performance issues that we obtain by studying their bug-tracking systems and profiling their latest versions. To prove our point, we manually fix 64 performance issues in their latest versions and obtain a median speedup of 2x (and up to 39x max) with fewer than 5 lines of code change in most cases. Many of the issues we found have been confirmed by developers, and we have implemented ways to identify other code fragments with similar issues as well.
Next, we recognize that many modern database-backed web applications are built upon Object Relational Mapping (ORM) frameworks. While such frameworks ease application development by abstracting persistent data as objects, such convenience comes with a performance cost. In addition to the study above, we also performed studied another 27 real-world open-source applications built on top of the popular Ruby on Rails ORM framework, with the goal to understand the database-related performance inefficiencies in these applications. We discovered a number of inefficiencies ranging from physical design issues to how queries are expressed in the application code. We applied static program analysis to identify and measure how prevalent these issues are, then suggested techniques to alleviate these issues and measured the potential performance gain as a result.
Web developers face the stringent task of designing informative web pages while keeping the page-load time low. This task has become increasingly challenging as most web contents are now generated by processing ever-growing amount of user data stored in back-end databases. It is difficult for developers to understand the cost of generating every web-page element, not to mention explore and pick the web design with the best trade-off between performance and functionality. In response, we built Panorama, a view-centric and database-aware development environment for web developers. Using database-aware program analysis and novel IDE design, Panorama provides developers with intuitive information about the cost and the performance-enhancing opportunities behind every HTML element, as well as suggesting various global code refactorings that enable developers to easily explore a wide spectrum of performance and functionality trade-offs.
Our code and datasets created from this project have been released on open source: https://hyperloop-rails.github.io. Moreoever, concepts developed from this project have been incorporated into courses that are taught by the PIs, at both undergraduate and graduate levels. The results have also been published and presented at top-tier venues in software engineering and data management research communities.
Last Modified: 02/06/2022
Modified by: Alvin Cheung
Please report errors in award information by writing to: awardsearch@nsf.gov.