
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | August 6, 2021 |
Latest Amendment Date: | August 6, 2021 |
Award Number: | 2101052 |
Award Instrument: | Standard Grant |
Program Manager: |
Anna Squicciarini
asquicci@nsf.gov (703)292-5177 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2021 |
End Date: | September 30, 2024 (Estimated) |
Total Intended Award Amount: | $500,000.00 |
Total Awarded Amount to Date: | $500,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
660 S MILL AVENUE STE 204 TEMPE AZ US 85281-3670 (480)965-5479 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
P.O. Box 876011 Tempe AZ US 85287-6011 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Secure &Trustworthy Cyberspace |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Generative models describe real-world data distributions such as images, texts, and human motions, and are playing an essential role in a large and growing range of applications from photo editing to natural language processing to autonomous driving. There are two open challenges regarding the development and dissemination of generative models: (1) Adversarial applications of generative models have created concerning socio-technical disturbances (e.g., espionage operations and malicious impersonation); and (2) developing generative models using multiple proprietary datasets (which are needed to reduce data biases) raises privacy concerns about data leakage. Legislative efforts have recently been taken in the wake of these challenges, so far with limited consensus on the format of regulations and knowledge about their technological or social feasibility. To this end, this project will develop new mathematical theories and computational tools to assess the feasibility of two connected solutions to these challenges: Model attribution enforces the owners to be correctly identified based on their generated contents; secure training ensures zero data leakage during the collaborative training of attributable generative models. If successful, the outcomes of the project will provide technical guidance for future regulation design towards secure development and dissemination of generative models. Project results will be disseminated through a project website, open-source software, and public datasets. The impacts of the project will be broadened through educational activities, including new course modules on Artificial Intelligence (AI) security, undergraduate research projects, and outreach to the local community through lab tours, to prepare underrepresented groups with skills to mitigate risks from malicious impersonation and biased data/model representations targeting these groups.
This project will focus on synergistic research tasks towards decentralized model attribution and secure training of generative models. In the former, the research team will study the systematic design of a set of user-end generative models that can be certifiably attributed by a set of binary classifiers, which are stored in a decentralized manner to mitigate security risks. The technical feasibility of decentralized attribution will be measured by the tradeoffs between attributability, generation quality, and model capacity. In the latter, the research team will study secure multi-party training of generative models and the associated binary classifiers for attribution. Data privacy and training scalability will be balanced through the design of security-friendly model architectures and learning losses. New knowledge will be created that differentiates this project from the existing state-of-the-art literature in digital forensics and secure computation: (1) Sufficient conditions for decentralized attribution will be developed, which will reveal analytical connections between attributability, data geometry, model architecture, and generation quality. (2) The sufficient conditions will enable estimation of the capacity of attributable models for a given dataset and generation quality tolerance. (3) Feasibility of sublinear secure vector multiplication will be studied, which will fundamentally improve the scalability of secure collaborative training. (4) Privacy-friendly activation and loss functions will be designed for the training of user-end generative models and the classifiers for attribution.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Introduction
This project aimed to address the challenges of securely training generative models and attributing generative models with robust watermarking techniques, advancing our understanding of privacy and attribution in AI systems. Supported by the National Science Foundation (NSF), the work has both intellectual merit and broader impacts, as detailed below.
Intellectual Merit
The intellectual merit of this project lies in its significant advancements in generative model training, watermarking techniques, and causal inference methodologies. Key outcomes include:
-
Novel GenAI watermarking inspired by statistical phyiscs: We introduced robust methods for watermarking generative AI models using N-point correlation functions (NPCFs) and Quantization Index Modulation (QIM). These methods are highly resistant to geometric attacks and ensure minimal perceptual distortion in outputs.
-
New theories and methods for GenAI secure training: The project developed privacy-preserving GAN training protocols using multi-party computation (MPC), achieving a significant reduction in computational costs while maintaining data security. Additionally, a novel secure protocol for bivariate causal discovery, AITIA, was proposed, optimizing computational efficiency and accuracy.
-
Scholarly Contributions: The work resulted in multiple peer-reviewed publications, including presentations at prestigious conferences such as ICML, ICLR, CVPR, PETS, and CCS, contributing to the fields of secure AI, watermarking, and causal inference.
Broader Impacts
Beyond academic contributions, the project achieved broader societal benefits by:
-
Workforce Development: Supporting the training of four PhD students, equipping them with expertise in secure AI systems, cryptographic methods, and ethical considerations in AI.
-
Community Engagement: Collaborating with national supercomputing centers through the NAIRR program.
-
Public Dissemination: Communicating findings through public talks, accessible digital media, and open-source tools such as the AITIA protocol on GitHub.
Summary of Outcomes
Throughout the life of the award, the project achieved significant milestones:
-
Conducted rigorous experiments on watermarking techniques, demonstrating significantly improved tradeoff among GenAI content attribution accuracy, generation quality, and key capacity under combined attacks.
-
Developed privacy-preserving GAN training protocols, reducing training time by up to 16 times compared to full MPC implementations.
-
Proposed AITIA, achieving a 3.6-340 times speedup in secure causal inference computations.
-
Published 12 papers in top-tier conferences, advancing the state of the art in generative AI watermarking, secure training, and causal discovery.
In conclusion, this NSF-funded project has enriched scientific knowledge, trained future researchers, and contributed tools and insights for secure and ethical AI development. These outcomes demonstrate the vital role of fundamental research in addressing complex challenges and inspiring innovation for a better future.
Last Modified: 01/12/2025
Modified by: Yi Ren
Please report errors in award information by writing to: awardsearch@nsf.gov.