
NSF Org: |
CNS Division Of Computer and Network Systems |
Recipient: |
|
Initial Amendment Date: | September 4, 2018 |
Latest Amendment Date: | July 12, 2022 |
Award Number: | 1816019 |
Award Instrument: | Standard Grant |
Program Manager: |
Dan Cosley
dcosley@nsf.gov (703)292-8832 CNS Division Of Computer and Network Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 15, 2018 |
End Date: | January 31, 2023 (Estimated) |
Total Intended Award Amount: | $291,471.00 |
Total Awarded Amount to Date: | $331,471.00 |
Funds Obligated to Date: |
FY 2019 = $8,000.00 FY 2020 = $16,000.00 FY 2021 = $8,000.00 FY 2022 = $8,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
1109 GEDDES AVE STE 3300 ANN ARBOR MI US 48109-1015 (734)763-6438 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
4901 Evergreen Rd. Dearborn MI US 48128-2406 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Special Projects - CNS, Secure &Trustworthy Cyberspace |
Primary Program Source: |
01001819DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT 01002021DB NSF RESEARCH & RELATED ACTIVIT 01002122DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The proliferation of powerful smart-computing devices (e.g., smartphones, surveillance systems) capable of production, editing, analysis, and sharing of multimedia files and associated technological advances have affected almost every aspect of our lives. The use of digital multimedia (images, audio, and video) as evidence is rapidly growing in multiple applications, including legal proceedings and law enforcement. However, forensic audio examiners are facing a new challenge of analyzing evidence containing audio from social networking websites, because audio editing and manipulation tools are both sophisticated and easy to use, increasing the risk of audio forgery. The goal of this project is to develop a framework and methods to support audio forensics examiners in detecting and localizing tampering in audio files, including developing novel algorithms to associate files to specific recording devices; creating methods to detect and estimate the risks of attempts to evade existing forgery detectors; evaluating speaker recognition systems in the presence of audio replay attacks; and collecting a large and diverse dataset of recordings that can be used for benchmarking of existing and future audio forensic analysis tools and techniques. The project also has a significant educational component, consisting of a set of hands-on activities involving media generation, manipulation, and analysis aimed at outreach and broadening participation in science, technology, engineering and mathematics (STEM) disciplines including forensic science, digital signal processing, and statistical data analysis and digital forensics.
The project is has four main research thrusts. The first will involve designing effective microphone fingerprint modeling and extraction algorithms tailored for audio forensic applications. The team will leverage microphone calibration methods, statistical signal processing techniques for blind microphone fingerprint estimation, and system identification methods for linking an audio recording to a specific recording device. The second thrust aims to investigate the impact of anti-forensic attacks on existing forgery detectors and replay attacks on speaker recognition systems. The research team will design attack models to perturb the underlying forgery detection feature space and analyze performance of existing and new algorithms under these anti-forensics attacks. The third research effort will be focused on designing new audio forensic analysis algorithms robust to these and other emerging anti-forensic attacks. The team will use manipulation methods for anti-forensic attacks and a game-theory-based framework for attack-aware tamper detection and design new forensic methods based on findings of these activities. The fourth research thrust will aim at developing a first-of-its-kind research commons for audio forensics consisting of benchmarking datasets, algorithms, and tools. The team will collect audio from both controlled settings and crowdsourcing in the wild, and use known audio manipulation, editing, and anti-forensic techniques to generate tampered datasets. The team will design and deploy the benchmarking testbed, ForensicExaminer, using a micro services architecture.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Wide spread adpontion of high-powered smart-computing devices, like smartphones, has profoundly impacted various aspects of our lives, including law enforcement and legal proceedings. These devices enable us to create, edit, analyze, and share multimedia files, driving significant technological advancements. Nowadays, digital multimedia, such as audio, video, and images, has become the standard form of evidence in litigation and criminal justice cases. However, ensuring the authenticity and integrity of this media poses significant challenges. To be admissible as evidence, digital media must be proven authentic, meaning it provides a complete record of events since its creation. Additionally, its integrity must be verified, involving the detection of any disruptions or intentional manipulations in the recording, such as insertions or compressions. This verification process becomes even more difficult when there is a lack of supporting data, like digital watermarks, and when the media has been altered with anti-forensic intent. Furthermore, the availability of powerful and user-friendly digital editing, open-source availability of generative AI algorithms and manipulation tools has further complicated the authentication and integrity verification of digital audio files.
The goal of this project is to investigate microphone and acoustic environment signatures in the evidentiary recording, to investigate the impact of anti-forensic attacks on existing methods, to design robust attack-aware algorithms against anti-forensic attacks, and to develop a common platform for benchmarking audio forensics algorithms and tools. These objectives have been achieved through the creation and implementation of a following diverse range of methodologies, tools, and collaborative research platforms:
- In this project, we have developed several novel methods to improve the capture of microphone signatures from voice replay, voice cloning, and the algorithms used for their synthesis. We have also focused on understanding the dynamic speech variations in genuine signals and distinguishing them from spoofed samples.
- Additionally, our research has shed light on the vulnerabilities of automated speaker verification systems to AI-cloned voices and multi-order replay spoofing. Furthermore, we have explored the impact of deepfake, and recorded audio injected through LASER into MEMS microphones, and we have devised countermeasures to detect such attacks.
- Moreover, we have specifically crafted liveness detection techniques tailored for the purpose of speaker verification.
- Unlike other single attack-specific countermeasures, we have developed two novel unified anti-spoofing methods that can handle various attacks on automated speaker verification using a single descriptor.
- The existing countermeasures against AI-generated voices are unable to reliably detect unseen samples or audio generated by new AI algorithms. Therefore, we have developed a novel spoofing transformer network called SpotNet.
- Furthermore, we have developed a secure and lightweight automated speaker verification system that not only reliably authenticates speakers/users but also demonstrates resistance against multiple audio spoofing attacks. Moreover, this system possesses the capability to capture the signature of voice cloning algorithms.
- In addition, we have developed deep learning-based methods for detecting both audio and video deepfakes by capturing temporal, spatial, and inter-domain inconsistencies. Furthermore, we have created the first-ever neuro-symbolic deepfake detection framework that leverages the observation that deepfakes often exhibit inter- or intra-modality inconsistencies in the emotional expressions of the manipulated individuals.
- We have also developed a framework for securing voice-controlled devices/services, e.g., Amazon Alexa, Google Home, Apple Siri, etc., against emerging threat vectors including LASER injection attacks.
- We have created a publicly available voice spoofing detection corpus (VSDC) that includes bonafide samples as well as first-order and second-order replay samples.
- We have developed a research commons platform that facilitates apple-to-apple comparative analysis of state-of-the-art voice spoofing countermeasures.
- We have also investigated the impact of anti-forensic attacks on automatic speaker verification (ASV) and voice countermeasures. This investigation involved using voice samples obtained through facemasks and compressed audio samples, among other techniques.
- Additionally, we investigated the impact of adversarial attacks, including mole or other perturbation-based attacks, on audio-visual deepfake detection systems, and
- We have developed an approach based on game theory to enhance the security of audio-visual deepfake detection systems
The intellectual merit of this project is its interdisciplinary approach that spans multiple domains, including cybersecurity, audio/speech processing, digital forensics, applied probability theory, and large-scale experimentation with audio data. At its core, this project involves the development of novel algorithms capable of accurately modeling and extracting the unique characteristics of acquisition devices, effectively addressing anti-forensic attacks, assessing system performance in the face of such attacks, designing forensic detectors that are resilient to these attacks, and curating comprehensive datasets for benchmarking and platform development purposes.
This project has made substantial contributions with broader impacts spanning multiple domains including multimedia forensics, content integrity verificaion, civil and criminal proceedings, national security, the fintech industry, law enforcement, cyberspace, voice-activated services, and the entertainment industry.
Last Modified: 06/21/2023
Modified by: Hafiz Malik
Please report errors in award information by writing to: awardsearch@nsf.gov.