Award Abstract # 2147350
FAI: A New Paradigm for the Evaluation and Training of Inclusive Automatic Speech Recognition

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF ILLINOIS
Initial Amendment Date: February 11, 2022
Latest Amendment Date: February 11, 2022
Award Number: 2147350
Award Instrument: Standard Grant
Program Manager: Eleni Miltsakaki
emiltsak@nsf.gov
 (703)292-2972
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: February 15, 2022
End Date: January 31, 2026 (Estimated)
Total Intended Award Amount: $500,000.00
Total Awarded Amount to Date: $500,000.00
Funds Obligated to Date: FY 2022 = $500,000.00
History of Investigator:
  • Mark Hasegawa-Johnson (Principal Investigator)
    jhasegaw@illinois.edu
  • Zsuzsanna Fagyal (Co-Principal Investigator)
  • Najim Dehak (Co-Principal Investigator)
  • Piotr Zelasko (Co-Principal Investigator)
  • Laureano Moro-Velazquez (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Illinois at Urbana-Champaign
506 S WRIGHT ST
URBANA
IL  US  61801-3620
(217)333-2187
Sponsor Congressional District: 13
Primary Place of Performance: University of Illinois at Urbana-Champaign
506 S. Wright Street
Urbana
IL  US  61801-3620
Primary Place of Performance
Congressional District:
13
Unique Entity Identifier (UEI): Y8CWNJRCNN91
Parent UEI: V2PHZ2CSCH63
NSF Program(s): Fairness in Artificial Intelli
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 075Z
Program Element Code(s): 114Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070, 47.075

ABSTRACT

Automatic speech recognition can improve your productivity in small ways: rather than searching for a song, a product, or an address using a graphical user interface, it is often faster to accomplish these tasks using automatic speech recognition. For many groups of people, however, speech recognition works less well, possibly because of regional accents, or because of second-language accent, or because of a disability. This Fairness in AI project defines a new way of thinking about speech technology. In this new way of thinking, an automatic speech recognizer is not considered to work well unless it works well for all users, including users with regional accents, second-language accents, and severe disabilities. There are three sub-projects. The first sub-project will create black-box testing standards that speech technology researchers can use to test their speech recognizers, in order to test how useful their speech recognizer will be for different groups of people. For example, if a researcher discovers that their product works well for some people, but not others, then the researcher will have the opportunity to gather more training data, and to perform more development, in order to make sure that the under-served community is better-served. The second sub-project will create glass-box testing standards that researchers can use to debug inclusivity problems. For example, if a speech recognizer has trouble with a particular dialect, then glass-box methods will identify particular speech sounds in that dialect that are confusing the recognizer, so that researchers can more effectively solve the problem. The third sub-project will create new methods for training a speech recognizer in order to guarantee that it works equally well for all of the different groups represented in available data. Data will come from podcasts and the Internet. Speakers will be identified as members of a particular group if and only if they declare themselves to be members of that group. All of the developed software will be distributed open-source.

Automatic speech recognition has the potential to democratize the flow of information: artificially intelligent dialog agents can provide information to people who would otherwise not know where to look. The speech developer community's relentless focus on minimum error rate over the past fifty years has resulted in a productivity tool that works extremely well for those of whose speech patterns match its training data: typically, college-educated first-language speakers of a standardized dialect, with little or no speech disability. For many groups of people, however, speech recognition works less well, possibly because their speech patterns differ significantly from the standard dialect (e.g., because of regional accent), because of intra-group heterogeneity (e.g., regional African American dialects), or because the speech pattern of each individual in the group exhibits variability (e.g., people with severe disabilities, or second-language learners). The aim of this proposal is to create a new paradigm for the evaluation and training of inclusive automatic speech recognizers. The proposed new evaluation and training paradigm consists of three components: (1) A "black-box evaluation" is an evaluation that can measure the degree of inclusivity of a speech recognizer by observing its outputs, without access to source code or trained parameters. With appropriately balanced test data, a statistical test can determine whether or not a system provides all groups of users with the same error rates, and if different groups get different error rates, then the size of the difference can be read as a measurement of the size of the problem. (2) A "glass-box evaluation" is an evaluation that identifies error patterns that consistently differentiate between groups, and searches for the causes of those errors in the acoustic signal and in the trained parameters of the network. (3) Inclusive optimization is a family of end-to-end neural network training criteria, and training dataset design and augmentation criteria, that explicitly balance the need for low average error rate against the need for low inter-group and inter-speaker variance. In order to develop these new evaluation and training paradigms, the researchers propose to develop and distribute open-source data and tools. Data will be drawn from large public data sources including the 100,000-podcast corpus; researchers will search the corpus for dialog acts in which speakers identify themselves with a particular group, then distribute discovered group identities and manual transcriptions as open-source metadata. Tools will be implemented using open source toolkits including K2, and those tools will be distributed as open-source system recipes. Speech technology developers are a competitive bunch: if there is a single number that describes the inclusivity of a speech recognizer, and if there is reason to believe that number to be scientifically well-founded and desirable, then researchers all over the world will compete to make their systems more inclusive. Proposed research will develop such metrics, and associated data, and will deploy them open-source. This research will be held up as a model of the social impact of artificial intelligence in the ongoing outreach programs of the investigators.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Jahan, Maliha and Moro-Velazquez, Laureano and Thebaud, Thomas and Dehak, Najim and Villalba, Jesús "Model-Based Fairness Metric for Speaker Verification" 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , 2023 https://doi.org/10.1109/ASRU57964.2023.10389804 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page