
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 20, 2020 |
Latest Amendment Date: | July 22, 2021 |
Award Number: | 2008107 |
Award Instrument: | Continuing Grant |
Program Manager: |
Hector Munoz-Avila
IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2020 |
End Date: | September 30, 2024 (Estimated) |
Total Intended Award Amount: | $499,972.00 |
Total Awarded Amount to Date: | $499,972.00 |
Funds Obligated to Date: |
FY 2021 = $166,261.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
2200 W MAIN ST DURHAM NC US 27705-4640 (919)684-3030 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
Durham NC US 27708-0129 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
01002122DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
In a world where decisions are increasingly driven by data, data analytics skills have become an indispensable part of any education that seeks to prepare its students for the modern workforce. Essential in this skill set is the ability to work with structured data. The standard "tools of trade" for manipulating structured data include the venerable and ubiquitous SQL language as well as popular libraries heavily influenced by relational query languages, e.g., dplyr for R, DataFrame for pandas and Spark. Learning and debugging relational queries, however, pose challenges to novices. Even computer science students with programming backgrounds are often not used to thinking in terms of logic (e.g., when writing SQL queries) or functional programming (e.g., when writing queries using operators that resemble relational algebra). This project proposes to build a system called HNRQ (Helping Novices Learn and Debug Relational Queries) to address these challenges, by explaining why a query is wrong, and helping users to fix and learn relational queries in the process.
The first step in the project is to automatically construct small database instances as counterexamples to illustrate why queries return wrong results, and allow users to trace query execution over these instances. Going beyond convincing users that the queries are wrong, HNRQ further aims to guide users towards the next level of understanding---by helping them generalize from specific counterexamples to semantic descriptions of what cause wrong results, and by providing useful hints on how to approach the problems correctly. This ambitious goal will push the boundaries of existing research and will likely lead to the development of novel methodologies for providing explanations and hints. The project will make HNRQ general and practical by embracing the full complexity of real-world query languages and by delivering interactive performance for users to experiment with changes to queries and database instances, observe their effects, and obtain automated feedback and hints all in real time even for complex queries and large databases. The project plans to evaluate HNRQ not only through user studies but also by measuring its direct impact on learning outcomes. The project is committed to making HNRQ open-source and easy to adopt by educators around the world.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
When decisions are increasingly driven by data, data analytics skills have become an indispensable part of any education seeking to prepare students for the modern workforce. Essential in this skill set is the ability to work with structured data using the venerable and ubiquitous SQL language as well as popular libraries heavily influenced by relational query languages, e.g., dplyr for R, DataFrame for pandas and Spark. Learning and debugging relational queries, however, pose challenges to novices: even if they have a programming background, they are often not used to thinking in terms of relational logic or operators.
This project, named HNRQ (Helping Novices Learn and Debug Relational Queries), has built a suite of powerful software tools for database educators and students alike. In an educational setting, we are often given a reference query defined by the teacher and a potentially incorrect query written by a student. First, if the two queries return different results on some test database, the RATest tool automatically constructs a small instance that illustrates the difference between the queries but is much simpler for the student to understand. Second, CInsGen finds “conditional instances,” which are abstract instances that illustrate all possible ways of how to satisfy a complex query or to differentiate two queries. Compared with concrete instances, conditional instances hide unnecessary details and articulate general conditions, making it easier to spot logical differences between queries. Third, Qr-Hint provides actionable hints to fix a working query so that it becomes semantically equivalent to the reference query. These hints purposefully guide the student through a sequence of steps that incrementally transform the working query such that it becomes correct in the end. Together, these three tools offer help that are specifically tailored to students’ individual mistakes, but do so automatically without revealing the reference query or requiring extensive personal tutoring. Finally, in settings where no reference query is known, i-Rex is a novel debugger that helps students understand SQL query evaluation and debug SQL queries. It allows students to trace query evaluation and study the lineage among input, output, and intermediate result rows. It has a “pinning” feature that focuses on relevant parts of executions to examine, as well as pagination and “teleporting” features that allow the system to reproduce relevant parts of execution without starting from the beginning, significantly improving scalability of debugging on massive databases. This project has also started to investigate challenges and opportunities posed by the rise of Generative AI to database education. Specifically, it has produced preliminary results on how to leverage large language models to help students decompose complex queries into simpler steps and describe them, and how to verify the correctness of automatically generated SQL code.
The HNRQ suite of tools have been deployed in undergraduate and graduate database courses at Duke University, benefiting more than 1,800 students during the project period, and will be continued in the future. The project has provided research experiences for learners at many levels, including one postdoctoral fellow, 4 PhD students, 6 MS students, 12 undergraduate students, and one high school student. Two alumni of the project are now Assistant Professors. In addition to the educational impact, the research carried out under the HNRQ project has also deepened the understanding of many fundamental problems in databases, resulting in numerous research papers and system demonstrations at top publication venues, 3 keynote speeches at international workshops and conferences, as well as 5 invited talks at research lab/universities.
Last Modified: 01/15/2025
Modified by: Jun Yang
Please report errors in award information by writing to: awardsearch@nsf.gov.