
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | January 25, 2021 |
Latest Amendment Date: | December 4, 2023 |
Award Number: | 2040926 |
Award Instrument: | Standard Grant |
Program Manager: |
Todd Leen
tleen@nsf.gov (703)292-7215 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2021 |
End Date: | September 30, 2023 (Estimated) |
Total Intended Award Amount: | $375,000.00 |
Total Awarded Amount to Date: | $383,000.00 |
Funds Obligated to Date: |
FY 2022 = $8,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
5000 FORBES AVE PITTSBURGH PA US 15213-3890 (412)268-8746 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
5000 Forbes Avenue Pittsburgh PA US 15213-3815 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Fairness in Artificial Intelli |
Primary Program Source: |
01002122DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Advances in natural language processing (NLP) technology now make it possible to perform many tasks through natural language or over natural language data -- automatic systems can answer questions, perform web search, or command our computers to perform specific tasks. However, ``language'' is not monolithic; people vary in the language they speak, the dialect they use, the relative ease with which they produce language, or the words they choose with which to express themselves. In benchmarking of NLP systems however, this linguistic variety is generally unattested. Most commonly tasks are formulated using canonical American English, designed with little regard for whether systems will work on language of any other variety. In this work we ask a simple question: can we measure the extent to which the diversity of language that we use affects the quality of results that we can expect from language technology systems? This will allow for the development and deployment of fair accuracy measures for a variety of tasks regarding language technology, encouraging advances in the state of the art in these technologies to focus on all, not just a select few.
Specifically, this work focuses on four aspects of this overall research question. First, we will develop a general-purpose methodology for quantifying how well particular language technologies work across many varieties of language. Measures over multiple speakers or demographics are combined to benchmarks that can drive progress in development of fair metrics for language systems, tailored to the specific needs of design teams. Second, we will move beyond simple accuracy measures, and directly quantify the effect that the accuracy of systems has on users in terms of relative utility derived from using the system. These measures of utility will be incorporated in our metrics for system success. Third, we focus on the language produced by people from varying demographic groups, predicting system accuracies from demographics. Finally, we will examine novel methods for robust learning of NLP systems across language or dialectal boundaries, and examine the effect that these methods have on increasing accuracy for all users.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Advances in natural language processing (NLP) technology now make it possible to perform many tasks through natural language or over natural language data -- automatic systems can answer questions, perform web search, or command our computers to perform specific tasks. However, ``language'' is not monolithic; people vary in the language they speak, the dialect they use, the relative ease with which they produce language, or the words they choose with which to express themselves. In benchmarking of NLP systems however, this linguistic variety is generally unattested. Most commonly tasks are formulated using canonical American English, designed with little regard for whether systems will work on language of any other variety. In this work we asked a simple question: can we measure the extent to which the diversity of language that we use affects the quality of results that we can expect from language technology systems? This will allow for the development and deployment of fair accuracy measures for a variety of tasks regarding language technology, encouraging advances in the state of the art in these technologies to focus on all, not just a select few.
Specifically, this work focused on several aspects of this overall research question:
- First, the project developed a general-purpose methodology for quantifying how well particular language technologies work across many varieties of language. It built benchmarks that considered the overall speaking population for various language varieties and built a benchmark GlobalBench.
- Second, it moved beyond simple accuracy measures, and directly quantified the effect that the accuracy of systems has on users in terms of relative utility derived from using the system. For instance, it examined the disparate effect of mistakes in voice recognition on people of various demographics.
- Third, it examined novel methods for robust learning of NLP systems across language or dialectal boundaries, and examine the effect that these methods have on increasing accuracy for all users. For instance, it created techniques that can generate text in many different language varieties.
Last Modified: 02/02/2024
Modified by: Graham Neubig
Please report errors in award information by writing to: awardsearch@nsf.gov.