Award Abstract # 1816701
III: Small: Towards Speech-Driven Multimodal Querying

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF CALIFORNIA, SAN DIEGO
Initial Amendment Date: July 31, 2018
Latest Amendment Date: July 31, 2018
Award Number: 1816701
Award Instrument: Standard Grant
Program Manager: Hector Munoz-Avila
hmunoz@nsf.gov
 (703)292-4481
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2018
End Date: September 30, 2022 (Estimated)
Total Intended Award Amount: $500,000.00
Total Awarded Amount to Date: $500,000.00
Funds Obligated to Date: FY 2018 = $500,000.00
History of Investigator:
  • Arun Kumar (Principal Investigator)
    arunkk@eng.ucsd.edu
  • Lawrence Saul (Co-Principal Investigator)
  • Ndapandula Nakashole (Co-Principal Investigator)
Recipient Sponsored Research Office: University of California-San Diego
9500 GILMAN DR
LA JOLLA
CA  US  92093-0021
(858)534-4896
Sponsor Congressional District: 50
Primary Place of Performance: University of California-San Diego
La Joll
CA  US  92093-0934
Primary Place of Performance
Congressional District:
50
Unique Entity Identifier (UEI): UYTTZT6G9DT1
Parent UEI:
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923, 7364, 075Z
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Modern automatic speech recognition (ASR) tools offer near-human accuracy in many scenarios. This has increased the popularity of speech-driven input in many applications on modern device environments such as tablets and smartphones, while also enabling personal conversational assistants. In this context, this project will study a seemingly simple but important fundamental question: how should one design a speech-driven system to query structured data? Structured data querying is ubiquitous in the enterprise, healthcare, and other domains. Typing queries in the Structured Query Language (SQL) is the gold standard for such querying. But typing SQL is painful or impossible in the above environments, which restricts when and how users can consume their data. SQL also has a learning curve. Existing alternatives such as typed natural language interfaces help improve usability but sacrifice query sophistication substantially. For instance, conversational assistants today support queries mainly over curated vendor-specific datasets, not arbitrary database schemas, and they often fail to understand query intent. This has widened the gap with SQL's high query sophistication and unambiguity. This project will bridge this gap by enabling users to interact with structured data using spoken queries over arbitrary database schemas. It will lead to prototype systems on popular tablet, smartphone, and conversational assistant environments. This could help many data professionals such as data analysts, business reporters, and database administrators, as well as non-technical data enthusiasts. For instance, nurse informaticists can retrieve patient details more easily and unambiguously to assist doctors, while analysts can slice and dice their data even on the move. The research will be disseminated as publications in database and natural language processing conferences. The research and artifacts produced will be integrated into graduate and undergraduate courses on database systems. The PIs will continue supporting students from under-represented groups as part of this project.

This project will create three new systems for spoken querying at three levels of "naturalness." The first level targets a tractable and meaningful subset of SQL. This research will exploit three powerful properties of SQL that regular English speech lacks--unambiguous context-free grammar, knowledge of the database schema queried, and knowledge of tokens from the database instance queried--to support arbitrary database schemas and tokens not present in the ASR vocabulary. The PIs will synthesize and innovate upon ideas from information retrieval, natural language processing, and database indexing and combine them with human-in-the-loop query correction to improve accuracy and efficiency. The second version will make SQL querying even more natural and stateful by changing its grammar. This will lead to the first speech-oriented dialect of SQL. The third version will apply the lessons from the previous versions to two state-of-the-art typed natural language interfaces for databases. This will lead to a redesign of such interfaces that exploits both the properties of speech and the database instance queried.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Shah, Vraj and Li, Side and Kumar, Arun and Saul, Lawrence "SpeakQL: Towards Speech-driven Multimodal Querying of Structured Data" Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data , 2020 10.1145/3318464.3389777 Citation Details
Shah, Vraj and Li, Side and Yang, Kevin and Kumar, Arun and Saul, Lawrence "Demonstration of SpeakQL: Speech-driven Multimodal Querying of Structured Data" Proceedings of the 2019 International Conference on Management of Data , 2019 10.1145/3299869.3320224 Citation Details
Yutong Shao, Arun Kumar "Structured Data Representation in Natural Language Interfaces" A Quarterly bulletin of the Computer Society of the IEEE Technical Committee on Data Engineering , v.45 , 2022 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project studied the principles, methods, and tools of bringing relational database querying into the speech-first era of computing. Relational databases are the most common form of businesss-critical data in practice. Our project findings and artifacts could enable easier anytime-anywhere data retrieval and analytics for both data professionals and lay users on new devices such tablets, smartphones, and conversational assistants. We broke it down into three regimes of ease of use and query sophistication. 

The first regime focused on regular SQL, the most popular query language for relational databases. It was targeted at data professionals such as business analysts, nurse informatics practitioners, and system admins familiar with SQL. We devised the first speech+first multimodal querying interface for a non-trivial subset of SQL. With user studies we found that our system makes it significantly faster for people to specify their queries on tablets compared to typing SQL.

For the second regime, we created the first speech-oriented dialect of SQL. This too is targeted at data professionals. It builds on our lessons of the rigidity of SQL for dictation from the first project to infuse a small set of focused relaxations to SQL syntax that are easy to pick up for SQL users. With user studies we found that our dialect is significantly easier for querying than speaking regular SQL, while still preserving the exactness and correctness guarantees of SQL.

For the third regime, we studied the intersection between speech-first querying and natural language interfaces to databases, which convert regular English queries to SQL. This can be useful even for lay users of relational data, e.g., for querying about restaurants or general facts on conversational assistants. We showed the being aware of audio features when translating natural language to SQL can make it more accurate.


Last Modified: 11/08/2022
Modified by: Arun K Kumar

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page