Award Abstract # 2101052
SaTC: CORE: Small: Decentralized Attribution and Secure Training of Generative Models

NSF Org: CNS
Division Of Computer and Network Systems
Recipient: ARIZONA STATE UNIVERSITY
Initial Amendment Date: August 6, 2021
Latest Amendment Date: August 6, 2021
Award Number: 2101052
Award Instrument: Standard Grant
Program Manager: Anna Squicciarini
asquicci@nsf.gov
 (703)292-5177
CNS
 Division Of Computer and Network Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2021
End Date: September 30, 2024 (Estimated)
Total Intended Award Amount: $500,000.00
Total Awarded Amount to Date: $500,000.00
Funds Obligated to Date: FY 2021 = $500,000.00
History of Investigator:
  • Yi Ren (Principal Investigator)
    yiren@asu.edu
  • Yezhou Yang (Co-Principal Investigator)
  • Ni Trieu (Co-Principal Investigator)
Recipient Sponsored Research Office: Arizona State University
660 S MILL AVENUE STE 204
TEMPE
AZ  US  85281-3670
(480)965-5479
Sponsor Congressional District: 04
Primary Place of Performance: Arizona State University
P.O. Box 876011
Tempe
AZ  US  85287-6011
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): NTLHJXM55KZ6
Parent UEI:
NSF Program(s): Secure &Trustworthy Cyberspace
Primary Program Source: 01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 025Z, 7923
Program Element Code(s): 806000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Generative models describe real-world data distributions such as images, texts, and human motions, and are playing an essential role in a large and growing range of applications from photo editing to natural language processing to autonomous driving. There are two open challenges regarding the development and dissemination of generative models: (1) Adversarial applications of generative models have created concerning socio-technical disturbances (e.g., espionage operations and malicious impersonation); and (2) developing generative models using multiple proprietary datasets (which are needed to reduce data biases) raises privacy concerns about data leakage. Legislative efforts have recently been taken in the wake of these challenges, so far with limited consensus on the format of regulations and knowledge about their technological or social feasibility. To this end, this project will develop new mathematical theories and computational tools to assess the feasibility of two connected solutions to these challenges: Model attribution enforces the owners to be correctly identified based on their generated contents; secure training ensures zero data leakage during the collaborative training of attributable generative models. If successful, the outcomes of the project will provide technical guidance for future regulation design towards secure development and dissemination of generative models. Project results will be disseminated through a project website, open-source software, and public datasets. The impacts of the project will be broadened through educational activities, including new course modules on Artificial Intelligence (AI) security, undergraduate research projects, and outreach to the local community through lab tours, to prepare underrepresented groups with skills to mitigate risks from malicious impersonation and biased data/model representations targeting these groups.

This project will focus on synergistic research tasks towards decentralized model attribution and secure training of generative models. In the former, the research team will study the systematic design of a set of user-end generative models that can be certifiably attributed by a set of binary classifiers, which are stored in a decentralized manner to mitigate security risks. The technical feasibility of decentralized attribution will be measured by the tradeoffs between attributability, generation quality, and model capacity. In the latter, the research team will study secure multi-party training of generative models and the associated binary classifiers for attribution. Data privacy and training scalability will be balanced through the design of security-friendly model architectures and learning losses. New knowledge will be created that differentiates this project from the existing state-of-the-art literature in digital forensics and secure computation: (1) Sufficient conditions for decentralized attribution will be developed, which will reveal analytical connections between attributability, data geometry, model architecture, and generation quality. (2) The sufficient conditions will enable estimation of the capacity of attributable models for a given dataset and generation quality tolerance. (3) Feasibility of sublinear secure vector multiplication will be studied, which will fundamentally improve the scalability of secure collaborative training. (4) Privacy-friendly activation and loss functions will be designed for the training of user-end generative models and the classifiers for attribution.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Cho, Yongbaek and Kim, Changhoon and Yang, Yezhou and Ren, Yi "Attributable Watermarking of Speech Generative Models" Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing , 2022 https://doi.org/10.1109/ICASSP43922.2022.9746578 Citation Details
Kim, Changhoon and Ren, Yi and Yang, Yezhou "DECENTRALIZED ATTRIBUTION OF GENERATIVE MODELS" International Conference on Learning Representations , 2021 Citation Details
Lepoint, Tancrède and Patel, Sarvar and Seth, Karn and Raykova, Mariana and Trieu, Ni. "Private Join and Compute from PIR with Default" International Conference on the Theory and Application of Cryptology and Information Security , 2021 https://doi.org/10.1007/978-3-030-92075-3_21 Citation Details
Nguyen, Truong Son and Wang, Lun and Kornaropoulos, Evgenios M and Trieu, Ni "AITIA: Efficient Secure Computation of Bivariate Causal Discovery" , 2024 https://doi.org/10.1145/3658644.3670337 Citation Details
Rosulek, Mike and Trieu, Ni "Compact and Malicious Private Set Intersection for Small Sets" ACM Conference on Computer and Communications Security (CCS) , 2021 https://doi.org/10.1145/3460120.3484778 Citation Details
Zhang, Lei and Ghimire, Mukesh and Zhang, Wenlong and Xu, Zhe and Ren, Yi "Value Approximation for Two-Player General-Sum Differential Games With State Constraints" IEEE Transactions on Robotics , v.40 , 2024 https://doi.org/10.1109/TRO.2024.3411850 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Introduction

This project aimed to address the challenges of securely training generative models and attributing generative models with robust watermarking techniques, advancing our understanding of privacy and attribution in AI systems. Supported by the National Science Foundation (NSF), the work has both intellectual merit and broader impacts, as detailed below.

Intellectual Merit

The intellectual merit of this project lies in its significant advancements in generative model training, watermarking techniques, and causal inference methodologies. Key outcomes include:

  1. Novel GenAI watermarking inspired by statistical phyiscs: We introduced robust methods for watermarking generative AI models using N-point correlation functions (NPCFs) and Quantization Index Modulation (QIM). These methods are highly resistant to geometric attacks and ensure minimal perceptual distortion in outputs.

  2. New theories and methods for GenAI secure training: The project developed privacy-preserving GAN training protocols using multi-party computation (MPC), achieving a significant reduction in computational costs while maintaining data security. Additionally, a novel secure protocol for bivariate causal discovery, AITIA, was proposed, optimizing computational efficiency and accuracy.

  3. Scholarly Contributions: The work resulted in multiple peer-reviewed publications, including presentations at prestigious conferences such as ICML, ICLR, CVPR, PETS, and CCS, contributing to the fields of secure AI, watermarking, and causal inference.

Broader Impacts

Beyond academic contributions, the project achieved broader societal benefits by:

  1. Workforce Development: Supporting the training of four PhD students, equipping them with expertise in secure AI systems, cryptographic methods, and ethical considerations in AI.

  2. Community Engagement: Collaborating with national supercomputing centers through the NAIRR program.

  3. Public Dissemination: Communicating findings through public talks, accessible digital media, and open-source tools such as the AITIA protocol on GitHub.

Summary of Outcomes

Throughout the life of the award, the project achieved significant milestones:

  • Conducted rigorous experiments on watermarking techniques, demonstrating significantly improved tradeoff among GenAI content attribution accuracy, generation quality, and key capacity under combined attacks.

  • Developed privacy-preserving GAN training protocols, reducing training time by up to 16 times compared to full MPC implementations.

  • Proposed AITIA, achieving a 3.6-340 times speedup in secure causal inference computations.

  • Published 12 papers in top-tier conferences, advancing the state of the art in generative AI watermarking, secure training, and causal discovery.

In conclusion, this NSF-funded project has enriched scientific knowledge, trained future researchers, and contributed tools and insights for secure and ethical AI development. These outcomes demonstrate the vital role of fundamental research in addressing complex challenges and inspiring innovation for a better future.

 

 


Last Modified: 01/12/2025
Modified by: Yi Ren

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page