Skip to feedback

Award Abstract # 1900638
CNS Core: Large: Autonomy and Privacy with Open Federated Virtual Assistants

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: THE LELAND STANFORD JUNIOR UNIVERSITY
Initial Amendment Date: March 7, 2019
Latest Amendment Date: June 7, 2022
Award Number: 1900638
Award Instrument: Continuing Grant
Program Manager: Jason Hallstrom
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: April 1, 2019
End Date: March 31, 2023 (Estimated)
Total Intended Award Amount: $3,000,000.00
Total Awarded Amount to Date: $3,000,000.00
Funds Obligated to Date: FY 2019 = $627,077.00
FY 2020 = $762,741.00

FY 2021 = $793,340.00

FY 2022 = $816,842.00
History of Investigator:
  • Monica Lam (Principal Investigator)
    lam@cs.stanford.edu
  • James Landay (Co-Principal Investigator)
  • Christopher Manning (Co-Principal Investigator)
  • David Mazières (Co-Principal Investigator)
  • Michael Bernstein (Co-Principal Investigator)
Recipient Sponsored Research Office: Stanford University
450 JANE STANFORD WAY
STANFORD
CA  US  94305-2004
(650)723-2300
Sponsor Congressional District: 16
Primary Place of Performance: Stanford University
353 Serra Mall
Stanford
CA  US  94305-5008
Primary Place of Performance
Congressional District:
16
Unique Entity Identifier (UEI): HJD6G4D6TJY5
Parent UEI:
NSF Program(s): CSR-Computer Systems Research
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT

01002223DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7925
Program Element Code(s): 735400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Virtual assistants, and more generally linguistic user interfaces, will become the norm for mobile and ubiquitous computing. This research aims to create the best open virtual assistant designed to respect privacy. Instead of just simple commands, virtual assistants will be able to perform complex tasks connecting different Internet-of-Things devices and web services. Also, users may decide who, what, when, and how their data are to be shared. By making the technology open-source, this research helps create a competitive industry that offers a great variety of innovative products, instead of closed platform monopolies.

This project unifies all the internet services and "Internet of Things" (IoT) devices into an interoperable web, with an open, crowdsourced, universal encyclopedia of public application interfaces called Thingpedia. Resources in Thingpedia can be connected together using ThingTalk, a high-level virtual assistant language. Another key contribution will be the Linguistic User Interface Network (LUInet) that can understand how to operate the world's digital interfaces in natural language. LUInet uses deep learning to translate natural language into ThingTalk. Privacy with fine-grain access control is provided through open-source federated virtual assistants. Transparent third-party sharing is supported by keeping human-understandable contracts and data transactions with a scalable blockchain technology.

This research contributes to the creation of a decentralized computing ecosystem that protects user privacy and promotes open competition. Natural-language programming expands the utility of computing to ordinary people, reducing the programming bottleneck. All the technologies developed in this project will be made available as open source, supporting further research and development by academia and industry. Thingpedia and the ThingTalk dataset will be an important contribution to natural language processing. The large-scale research program for college and high-school students, with a focus on diverse students, broadens participation and teaches technology, research, and the importance of privacy. All the information related to this project, papers, data, code, and results, are available at http://oval.cs.stanford.edu until at least 2026.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Campagna, Giovanni and Semnani, Sina and Kearns, Ryan and Koba Sato, Lucas Jun and Xu, Silei and Lam, Monica "A Few-Shot Semantic Parser for Wizard-of-Oz Dialogues with the Precise ThingTalk Representation" Findings of the Association for Computational Linguistics: ACL 2022 , v.Finding , 2022 https://doi.org/10.18653/v1/2022.findings-acl.317 Citation Details
Chi, Ethan A. and See, Abigail and Chiam, Caleb and Chang, Trenton and Kenealy, Kathleen and Lim, Swee Kiat and Hardy, Amelia and Rastogi, Chetanya and Li, Haojun and Iyabor, Alexander and He, Yutong and Sowrirajan, Hari and Qi, Peng and Sadagopan, Kaushi "Neural Generation Meets Real People: Building a Social, Informative Open-Domain Dialogue Agent" Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue , 2022 Citation Details
Krishna, Ranjay and Lee, Donsuk and Fei-Fei, Li and Bernstein, Michael S. "Socially situated artificial intelligence enables learning from human interaction" Proceedings of the National Academy of Sciences , v.119 , 2022 https://doi.org/10.1073/pnas.2115730119 Citation Details
Moradshahi, Mehrad and Semnani, Sina J and Lam, Monica S. "Zero and Few-Shot Localization of Task-Oriented Dialogue Agents with a Distilled Representation" Proceedings of the conference Association for Computational Linguistics European Chapter Conference , 2023 Citation Details
Moradshahi, Mehrad and Shen, Tianhao and Bali, Kalika and Choudhury, Monojit and de Chalendar, Gaël and Goel, Anmol and Kim, Sungkyun and Kodali, Prashant and Kumaraguru, Ponnurangam and Semmar, Nasredine and Semnani, Sina J. and Seo, Jiwon and Seshadri, "X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents" Findings of the Association for Computational Linguistics (ACL), Toronto, Canada, 2023 , 2023 Citation Details
Moradshahi, Mehrad and Tsai, Victoria and Campagna, Giovanni and Lam, Monica S. "Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues" Proceedings of the conference Association for Computational Linguistics European Chapter Conference , 2023 Citation Details
Yang, Jackie (Junrui) and Chen, Tuochao and Qin, Fang and Lam, Monica S. and Landay, James A. "HybridTrak: Adding Full-Body Tracking to VR Using an Off-the-Shelf Webcam" CHI '22: CHI Conference on Human Factors in Computing Systems , 2022 https://doi.org/10.1145/3491102.3502045 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project advances and democratizes deep-learning based conversational virtual assistant technology.  We have created new virtual assistant development methodologies and open-source tools that help developers (without being AI experts) create effective conversational agents for their chosen domains for different languages. We have validated our research with powerful privacy-preserving assistants and social chatbots. 

Technical Contributions.

Despite recent dramatic advances in large language models (LLMs), they tend to hallucinate and are hence untrustworthy as virtual assistants. We show that we can eliminate hallucination from LLMs by grounding them with external corpora, which can comprise databases, knowledge bases, free-text documents, APIs on the internet, and theorem provers. 

We introduce a new formal, executable notation that supports inter-operation of these different types of resources. Unlike existing support for APIs in virtual assistants, our notation captures the full signature of the API calls, which includes the results, and not just the input parameters. This is necessary to support compositionality, the key to automating high-level functions.  

We reduce the prohibitively high cost of data collection for training semantic parsers, which translate natural language into code, with a data synthesizer that covers all constructs and properties in schemas. 

To ground LLMs in open-text documents, we create a multi-stage LLM system with few-shot examples that inject factuality into the natural-sounding responses from LLMs. 

We distill large language models for virtual assistants into models small enough to run on a personal device. This not only improves the speed of the chatbot, but also privacy, affordability, and accessibility to many more developers.  

To promote equity across people speaking low-resource languages, we develop methodology and tools to internationalize and localize task-oriented agents to new languages. 

Our contributions to multi-modal assistants and robotic process automation include: We show that a virtual assistant microphone can be used to automatically determine the user position and gaze direction. We propose a novel way to combine touch and verbal commands so users can easily perform cross-application tasks on mobile devices. We allow non-programmers to automate sophisticated web routines by demonstrating them along with natural verbal instructions to incorporate programming concepts such as function composition, conditional, and iterative evaluation. We show how to create a universal call agent that follows natural language instructions to automatically interact with users and operate the web.

To support privacy, we have created a federated virtual assistant architecture so users can use natural language to share with each other privately, with fine-grain access control, with the help of a satisfiability-modulo-theory prover. 

Information Dissemination.

We have published extensively and given numerous keynotes and interviews on our research (https:/oval.cs.stanford.edu). We evangelized open-source assistants through organizing two workshops at Stanford. The first invitation-only workshop, held in 2019, was attended by practitioners and researchers from 30 different organizations. The second, a hybrid online/offline workshop, had over 400 registrants, from 60 educational institutions, 15 nonprofit/government organizations, and 200 companies in many different sectors. All our software is available at https://github.com/stanford-oval.

We have developed a new course, Conversational Virtual Assistant with Deep Learning, which offers students an opportunity to participate in state-of-the-art virtual assistant research. It was attended by a class consisting of 40% female and 10% black/hispanic students last year.

Impact.

  1. We have created the first open-source, conversational virtual assistant capable of the top ten most popular skills, including voice control of 15 different kinds of IoT devices. It can run privately on users’ own devices, and has been awarded Popular Science's “Best of What's New Award in Security” in 2019. It was also released as an open-source privacy-preserving voice interface for Home Assistant, a popular, open-source, crowdsourced IoT platform.
  2. Our Chirpy Cardinal Socialbot won second place in Alexa Prize Socialbot Grand Competitions,  2020 and 2021. It is currently featured as part of AI: More than Human, an art exhibition about the relationship between humans and AI which has been seen by over 260,000 visitors in Barcelona, Guangzhou, and Liverpool. 

  3. In collaboration with the School of Medicine, we have created a coach for autistic individuals that improves their social skills through drills on conversations. 

  4. In collaboration with an international team, we created a new multilingual few-shot benchmark, X-RiSAWOZ, that covers languages spoken by 3.5B people (China, English, French, Hindi, Hindi-English, Korean) using a tool we created to facilitate data curation.   Our few-shot agents achieve between 61-85% accuracy on dialogue state tracking across all the different languages.
     
  5. We have developed the first non-hallucinating LLM-based open-domain chatbot that is informational and conversational, and grounded on Wikipedia and Wikidata.  This unlocks the power of LLMs as a knowledge acquisition tool. 
  6. Our open-source Genie virtual assistant generation framework is uniquely designed to eliminate hallucination, handle large data schemas along with free-text retrieval, and support multilinguality.

  7. Our multimodal virtual assistant techniques have been commercialized by at least two startups. 

Last Modified: 08/07/2023
Modified by: Monica S Lam

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page