Award Abstract # 2007298
SHF:Small:Privacy Impact and Risk Assessment at Design-Time

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: CARNEGIE MELLON UNIVERSITY
Initial Amendment Date: June 29, 2020
Latest Amendment Date: April 18, 2024
Award Number: 2007298
Award Instrument: Standard Grant
Program Manager: Sol Greenspan
sgreensp@nsf.gov
 (703)292-7841
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2020
End Date: September 30, 2024 (Estimated)
Total Intended Award Amount: $498,221.00
Total Awarded Amount to Date: $524,221.00
Funds Obligated to Date: FY 2020 = $498,221.00
FY 2022 = $16,000.00

FY 2024 = $10,000.00
History of Investigator:
  • Travis Breaux (Principal Investigator)
    breaux@cs.cmu.edu
Recipient Sponsored Research Office: Carnegie-Mellon University
5000 FORBES AVE
PITTSBURGH
PA  US  15213-3815
(412)268-8746
Sponsor Congressional District: 12
Primary Place of Performance: Carnegie Mellon University
5000 Forbes Ave
Pittsburgh
PA  US  15213-3815
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): U3NKNFLNQ613
Parent UEI: U3NKNFLNQ613
NSF Program(s): Secure &Trustworthy Cyberspace,
Software & Hardware Foundation
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
01002425DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 025Z, 7923, 7944, 9178, 9251
Program Element Code(s): 806000, 779800
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Verifying that web and mobile applications will protect user privacy requires knowledge about what kinds of data and data practices are sensitive to users. Privacy impact assessments are standardized procedures that companies and government agencies use to identify what personal information is collected, used, and for what purpose, and shared with whom, as well as, what steps are taken to protect that information. Conducting privacy impact assessments on applications is time consuming, because evaluators often have limited knowledge of the software?s behavior, and the assessments are often done after the software has been constructed, which is costly. Because developers are under pressure to continuously release new application versions, they have little time for extensive documentation about their data practices. Today, the status quo in documenting privacy is the privacy policy, which regulators increasingly check for data practice misrepresentations during the application?s lifetime. This project seeks to develop methods and tools to automatically and quickly conduct privacy impact assessments from software artifacts, called user stories, that are easier for developers to produce. Based on a risk assessment informed by which data practices are most sensitive to users, developers can prioritize where best to introduce privacy controls that users want. Furthermore, by conducting risk assessments from user stories, regulators and developers would have greater assurance that assessments accurately reflect current app behavior. Finally, these assessments save developer time, because a change to a user story could trigger an automatic re-assessment that alerts the developer to changes in privacy risk. This research is transformative because it allows software developers to respond to changes in privacy risk during design time, when important safeguards can be introduced, as opposed to waiting for lengthier impact assessments that are harder to integrate after the software has been constructed.

The project investigates the symbolic and statistical relationships between agile requirements, privacy risk and privacy policies. The research explores strategies for scoring user stories for privacy risk and prioritizing which stories are most important to user privacy comprehension. The components of the solution will be investigated as follows: (1) corpora of user stories and privacy policies expressed in natural language will be acquired and annotated using coding theory; (2) semantic frames and an ontology expressed in Description Logic will be extracted from the corpora using entity and relation extraction; and (3) the risk scores will be collected using privacy risk surveys that measure how users perceive privacy risk under different scenarios derived from user stories and mitigations. A key obstacle to effectively scoring risk is the inherent presence of ambiguity and vagueness in natural language. The semantic frames and ontology will be used to encode and resolve ambiguity and vagueness in the scenarios. Furthermore, the survey results will be used to model changes in risk due to selected mitigations, thus, developers will be able to explore the local design space around a specific user story and available mitigation choices.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Huang, Tianjian and Kaulagi, Vaishnavi and Hosseini, Mitra Bokaei and Breaux, Travis "Mobile Application Privacy Risk Assessments from User-authored Scenarios" , 2023 https://doi.org/10.1109/RE57278.2023.00012 Citation Details
Santos, Sarah and Breaux, Travis and Norton, Thomas and Haghighi, Sara and Ghanavati, Sepideh "Requirements Satisfiability with In-Context Learning" , 2024 https://doi.org/10.1109/RE59067.2024.00025 Citation Details
Shen, Yuchen and Breaux, Travis "Stakeholder Preference Extraction From Scenarios" IEEE Transactions on Software Engineering , v.50 , 2024 https://doi.org/10.1109/TSE.2023.3333265 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.


Mobile and web applications provide users with services to solve everyday problems. Increasingly, these applications collect personal information to personalize these services to individual user needs. Because privacy is personal and not every user perceives the same level of privacy risk when sharing their personal information, developers need ways to elicit privacy requirements from users before or during design time. This project investigated new ways to collect privacy requirements directly from users by inviting users to describe their experiences in using mobile and web apps. By collecting user perceptions of privacy risk, we could train a machine learning model to predict which information types were high and low risk. This information could then be shared with developers to help them spot privacy hotspots in their application design. In addition, we conduct research to identify ways that software could increase the level of personalization to offer a better fit for user needs. This study revealed gaps in modern software applications where user needs are unaddressed, and where addressing those needs require collecting deeply personal information. This study raises awareness for new ways to develop better software while also identifies a greater need to adopt some of the methods and tools produced by this research. The research resulted in tools to conduct this elicitation exercise and collect the data that developers need.


Last Modified: 12/20/2024
Modified by: Travis Breaux

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page