
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | February 11, 2022 |
Latest Amendment Date: | February 11, 2022 |
Award Number: | 2147350 |
Award Instrument: | Standard Grant |
Program Manager: |
Eleni Miltsakaki
emiltsak@nsf.gov (703)292-2972 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | February 15, 2022 |
End Date: | January 31, 2026 (Estimated) |
Total Intended Award Amount: | $500,000.00 |
Total Awarded Amount to Date: | $500,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
506 S WRIGHT ST URBANA IL US 61801-3620 (217)333-2187 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
506 S. Wright Street Urbana IL US 61801-3620 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Fairness in Artificial Intelli |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070, 47.075 |
ABSTRACT
Automatic speech recognition can improve your productivity in small ways: rather than searching for a song, a product, or an address using a graphical user interface, it is often faster to accomplish these tasks using automatic speech recognition. For many groups of people, however, speech recognition works less well, possibly because of regional accents, or because of second-language accent, or because of a disability. This Fairness in AI project defines a new way of thinking about speech technology. In this new way of thinking, an automatic speech recognizer is not considered to work well unless it works well for all users, including users with regional accents, second-language accents, and severe disabilities. There are three sub-projects. The first sub-project will create black-box testing standards that speech technology researchers can use to test their speech recognizers, in order to test how useful their speech recognizer will be for different groups of people. For example, if a researcher discovers that their product works well for some people, but not others, then the researcher will have the opportunity to gather more training data, and to perform more development, in order to make sure that the under-served community is better-served. The second sub-project will create glass-box testing standards that researchers can use to debug inclusivity problems. For example, if a speech recognizer has trouble with a particular dialect, then glass-box methods will identify particular speech sounds in that dialect that are confusing the recognizer, so that researchers can more effectively solve the problem. The third sub-project will create new methods for training a speech recognizer in order to guarantee that it works equally well for all of the different groups represented in available data. Data will come from podcasts and the Internet. Speakers will be identified as members of a particular group if and only if they declare themselves to be members of that group. All of the developed software will be distributed open-source.
Automatic speech recognition has the potential to democratize the flow of information: artificially intelligent dialog agents can provide information to people who would otherwise not know where to look. The speech developer community's relentless focus on minimum error rate over the past fifty years has resulted in a productivity tool that works extremely well for those of whose speech patterns match its training data: typically, college-educated first-language speakers of a standardized dialect, with little or no speech disability. For many groups of people, however, speech recognition works less well, possibly because their speech patterns differ significantly from the standard dialect (e.g., because of regional accent), because of intra-group heterogeneity (e.g., regional African American dialects), or because the speech pattern of each individual in the group exhibits variability (e.g., people with severe disabilities, or second-language learners). The aim of this proposal is to create a new paradigm for the evaluation and training of inclusive automatic speech recognizers. The proposed new evaluation and training paradigm consists of three components: (1) A "black-box evaluation" is an evaluation that can measure the degree of inclusivity of a speech recognizer by observing its outputs, without access to source code or trained parameters. With appropriately balanced test data, a statistical test can determine whether or not a system provides all groups of users with the same error rates, and if different groups get different error rates, then the size of the difference can be read as a measurement of the size of the problem. (2) A "glass-box evaluation" is an evaluation that identifies error patterns that consistently differentiate between groups, and searches for the causes of those errors in the acoustic signal and in the trained parameters of the network. (3) Inclusive optimization is a family of end-to-end neural network training criteria, and training dataset design and augmentation criteria, that explicitly balance the need for low average error rate against the need for low inter-group and inter-speaker variance. In order to develop these new evaluation and training paradigms, the researchers propose to develop and distribute open-source data and tools. Data will be drawn from large public data sources including the 100,000-podcast corpus; researchers will search the corpus for dialog acts in which speakers identify themselves with a particular group, then distribute discovered group identities and manual transcriptions as open-source metadata. Tools will be implemented using open source toolkits including K2, and those tools will be distributed as open-source system recipes. Speech technology developers are a competitive bunch: if there is a single number that describes the inclusivity of a speech recognizer, and if there is reason to believe that number to be scientifically well-founded and desirable, then researchers all over the world will compete to make their systems more inclusive. Proposed research will develop such metrics, and associated data, and will deploy them open-source. This research will be held up as a model of the social impact of artificial intelligence in the ongoing outreach programs of the investigators.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
Please report errors in award information by writing to: awardsearch@nsf.gov.