Award Abstract # 1914444
SaTC: CORE: Medium: Collaborative: Automatically Answering People's Privacy Questions

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: THE PENNSYLVANIA STATE UNIVERSITY
Initial Amendment Date: July 8, 2019
Latest Amendment Date: June 17, 2020
Award Number: 1914444
Award Instrument: Standard Grant
Program Manager: Dan Cosley
dcosley@nsf.gov
 (703)292-8832
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: July 15, 2019
End Date: December 31, 2024 (Estimated)
Total Intended Award Amount: $437,436.00
Total Awarded Amount to Date: $451,436.00
Funds Obligated to Date: FY 2019 = $437,436.00
FY 2020 = $14,000.00
History of Investigator:
  • Shomir Wilson (Principal Investigator)
Recipient Sponsored Research Office: Pennsylvania State Univ University Park
201 OLD MAIN
UNIVERSITY PARK
PA  US  16802-1503
(814)865-1372
Sponsor Congressional District: 15
Primary Place of Performance: Pennsylvania State Univ University Park
State College
PA  US  16802-3000
Primary Place of Performance
Congressional District:
Unique Entity Identifier (UEI): NPM2J7MSCF61
Parent UEI:
NSF Program(s): Secure &Trustworthy Cyberspace
Primary Program Source: 01002021DB NSF RESEARCH & RELATED ACTIVIT
01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 025Z, 9178, 065Z, 7924, 9251
Program Element Code(s): 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

As novel technologies collect increasingly large and diverse amounts of data about us, people are unable to keep up and retain control over what happens to their data. The current legal approach to privacy concentrates on the concept of "Notice and Choice", namely the expectation that people are provided sufficient information about the collection and use of their data, and are offered meaningful choices about these practices (e.g., opt out, opt in). A primary element of this approach relies on privacy policies to communicate this information to people. In practice, these policies tend to be long, vague and ambiguous. Not too surprisingly, few people find the time to read them and those who do often struggle to understand what they say. This multi-disciplinary project aims to develop novel technology that will enable people to regain a sense of control by enabling them to simply ask questions about the privacy issues that matter to them rather than requiring them to read long, one-size-fits all privacy policies. In addition to producing new knowledge and technologies and contributing to improving the state of privacy in the United States, this project will also create education and research opportunities for both undergraduate and graduate students at participating universities, including activities to broaden participation of women and under-represented minorities in this important area of computer science, and contribute to the development of technologies with the potential to help the visually impaired take advantage of information found in the text of privacy policies.

This multi-disciplinary project builds on recent advances in natural language processing, machine learning, code analysis and user modeling to re-invent notice and choice, moving from long and hard-to-understand notices to interactive privacy dialogues with users. An important part of this research involves the development of question answering functionality that enables users to ask questions about those issues that truly matter to them rather than presenting them with one-size-fits-all privacy notices. Another involves supplementing disclosures found in privacy policies with additional sources of information such as background knowledge (e.g. knowledge about common data practices and relevant laws) and code analysis to disambiguate statements and provide additional details to users when it matters (e.g., with whom their data is actually shared). This research will be guided by user-centered design methodologies where the design of novel technologies is informed by findings from human subject studies, and where technologies are deployed and evaluated in increasingly rich and realistic scenarios. Products of this research will include prototype privacy Question Answering functionality, as well as technology to automatically extract information about data collection and use practices from both the text of privacy policies and from code.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Gupta, Sonu and Gopi, Geetika and Balaji, Harish and Poplavska, Ellen and O'Toole, Nora and Arora, Siddhant and Norton, Thomas and Sadeh, Norman and Wilson, Shomir "Creation and Analysis of an International Corpus of Privacy Laws" Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation , 2024 Citation Details
Abhilasha Ravichander, Alan W "Question Answering for Privacy Policies: Combining Computational and Legal Perspectives" Empirical Methods in Natural Language Processing , 2019 Citation Details
Feng, Yuanyuan and Yao, Yaxing and Sadeh, Norman "A Design Space for Privacy Choices: Towards Meaningful Privacy Control in the Internet of Things" Proceedings of the 2021 CHI Conference on Human Factors in computing Systems , 2021 https://doi.org/10.1145/3411764.3445148 Citation Details
Poplavska, Ellen and Norton, Thomas B. and Wilson, Shomir and Sadeh, Norman "From Prescription to Description: Mapping the GDPR to a Privacy Policy Corpus Annotation Scheme" Frontiers in artificial intelligence and applications , v.334 , 2020 Citation Details
Ravichander, Abhilasha and Black, Alan W and Norton, Thomas and Wilson, Shomir and Sadeh, Norman "Breaking Down Walls of Text: How Can NLP Benefit Consumer Privacy?" Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing , v.1 , 2021 https://doi.org/10.18653/v1/2021.acl-long.319 Citation Details
Ravichander, Abhilasha and Yang, Ian and Chen, Rex and Wilson, Shomir and Norton, Thomas and Sadeh, Norman "Incorporating Taxonomic Reasoning and Regulatory Knowledge into Automated Privacy Question Answering" , 2024 Citation Details
Srinath, Mukund and Sundareswara, Soundarya and Venkit, Pranav and Giles, C. Lee and Wilson, Shomir "Privacy Lost and Found: An Investigation at Scale of Web Privacy Policy Availability" DocEng '23: Proceedings of the ACM Symposium on Document Engineering 2023 , 2023 https://doi.org/10.1145/3573128.3604902 Citation Details
Vinayshekhar Bannihatti Kumar, Roger Iyengar "Finding a Choice in a Haystack: Automatic Extraction of Opt-Out Statements from Privacy Policy Text" The Web Conference , 2020 Citation Details
Habib, Hana and Zou, Yixin and Yao, Yaxing and Acquisti, Alessandro and Cranor, Lorrie and Reidenberg, Joel and Sadeh, Norman and Schaub, Florian "Toggles, Dollar Signs, and Triangles: How to (In)Effectively Convey Privacy Choices with Icons and Link Texts" Proceedings of the 2021 CHI Conference on Human Factors in computing Systems , 2021 https://doi.org/10.1145/3411764.3445387 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project built upon advances in natural language processing (NLP) and machine learning to develop question anwering (QA) functionality that enables users to ask questions about privacy concerns that matter most to them when using computing technologies. A core part of this work was large-scale collection and analysis of privacy policies of websites and mobile apps, as well as analysis of privacy laws. The project produced corpora and language models suitable for automating the extraction of important information from privacy policy text.

 

This project advanced knowledge in the areas of privacy analysis, natural language processing, question answering, human-computer interaction, and privacy decision making. Knowledge was also created through the interdisciplinary collaboration between computing researchers and law researchers, especially through the creation of corpora focused on the following: consumer's privacy questions and related answers (PrivacyQA), privacy laws from around the world (Privacy Law Corpus), and privacy questions and answers embeded in privacy policies (PrivaSeerQA). This project combined fundamental research with the development and evaulation of scalable technologies to extract data practices from privacy policy text. Additionally, the PIs created opportunities for graduate students and undergraduates to participate in research, and in the classroom setting, to gain exposure to cutting-edge results of research investigation.


Last Modified: 03/13/2025
Modified by: Shomir Wilson

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page