Award Abstract # 2040926
FAI: Quantifying and Mitigating Disparities in Language Technologies

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: CARNEGIE MELLON UNIVERSITY
Initial Amendment Date: January 25, 2021
Latest Amendment Date: December 4, 2023
Award Number: 2040926
Award Instrument: Standard Grant
Program Manager: Todd Leen
tleen@nsf.gov
 (703)292-7215
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2021
End Date: September 30, 2023 (Estimated)
Total Intended Award Amount: $375,000.00
Total Awarded Amount to Date: $383,000.00
Funds Obligated to Date: FY 2021 = $375,000.00
FY 2022 = $8,000.00
History of Investigator:
  • Graham Neubig (Principal Investigator)
    gneubig@andrew.cmu.edu
  • Jeffrey Bigham (Co-Principal Investigator)
  • Geoff Kaufman (Co-Principal Investigator)
  • Yulia Tsvetkov (Co-Principal Investigator)
  • Antonios Anastasopoulos (Co-Principal Investigator)
Recipient Sponsored Research Office: Carnegie-Mellon University
5000 FORBES AVE
PITTSBURGH
PA  US  15213-3890
(412)268-8746
Sponsor Congressional District: 12
Primary Place of Performance: Carnegie-Mellon University
5000 Forbes Avenue
Pittsburgh
PA  US  15213-3815
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): U3NKNFLNQ613
Parent UEI: U3NKNFLNQ613
NSF Program(s): Fairness in Artificial Intelli
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 075Z, 9251
Program Element Code(s): 114Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Advances in natural language processing (NLP) technology now make it possible to perform many tasks through natural language or over natural language data -- automatic systems can answer questions, perform web search, or command our computers to perform specific tasks. However, ``language'' is not monolithic; people vary in the language they speak, the dialect they use, the relative ease with which they produce language, or the words they choose with which to express themselves. In benchmarking of NLP systems however, this linguistic variety is generally unattested. Most commonly tasks are formulated using canonical American English, designed with little regard for whether systems will work on language of any other variety. In this work we ask a simple question: can we measure the extent to which the diversity of language that we use affects the quality of results that we can expect from language technology systems? This will allow for the development and deployment of fair accuracy measures for a variety of tasks regarding language technology, encouraging advances in the state of the art in these technologies to focus on all, not just a select few.

Specifically, this work focuses on four aspects of this overall research question. First, we will develop a general-purpose methodology for quantifying how well particular language technologies work across many varieties of language. Measures over multiple speakers or demographics are combined to benchmarks that can drive progress in development of fair metrics for language systems, tailored to the specific needs of design teams. Second, we will move beyond simple accuracy measures, and directly quantify the effect that the accuracy of systems has on users in terms of relative utility derived from using the system. These measures of utility will be incorporated in our metrics for system success. Third, we focus on the language produced by people from varying demographic groups, predicting system accuracies from demographics. Finally, we will examine novel methods for robust learning of NLP systems across language or dialectal boundaries, and examine the effect that these methods have on increasing accuracy for all users.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 24)
Blasi, Damian and Anastasopoulos, Antonios and Neubig, Graham "Systematic Inequalities in Language Technology Performance across the Worlds Languages" Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2022 https://doi.org/10.18653/v1/2022.acl-long.376 Citation Details
Debnath, Arnab and Rajabi, Navid and Alam, Fardina Fathmiul and Anastasopoulos, Antonios "Towards more equitable question answering systems: How much more data do you need?" Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) , 2021 https://doi.org/10.18653/v1/2021.acl-short.79 Citation Details
Faisal, Fahim and Anastasopoulos, Antonios "Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering" Proceedings of the 3rd Workshop on Machine Reading for Question Answering , 2021 https://doi.org/10.18653/v1/2021.mrqa-1.14 Citation Details
Faisal, Fahim and Keshava, Sharlina and Alam, Md Mahfuz and Anastasopoulos, Antonios "SD-QA: Spoken Dialectal Question Answering for the Real World" Findings of the Association for Computational Linguistics: EMNLP 2021 , 2021 https://doi.org/10.18653/v1/2021.findings-emnlp.281 Citation Details
Faisal, Fahim and Wang, Yinkai and Anastasopoulos, Antonios "Dataset Geography: Mapping Language Data to Language Users" Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2022 https://doi.org/10.18653/v1/2022.acl-long.239 Citation Details
Feng, Shangbin and Park, Chan Young and Liu, Yuhan and Tsvetkov, Yulia "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models" ACL: Annual Meeting of the Association for Computational Linguistics , 2023 Citation Details
Field, Anjalie and Blodgett, Su Lin and Waseem, Zeerak and Tsvetkov, Yulia "A Survey of Race, Racism, and Anti-Racism in NLP" Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) , 2021 https://doi.org/10.18653/v1/2021.acl-long.149 Citation Details
Field, Anjalie and Coston, Amanda and Gandhi, Nupoor and Chouldechova, Alexandra and Putnam-Hornstein, Emily and Steier, David and Tsvetkov, Yulia "Examining risks of racial biases in NLP tools for child protective services" FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency , 2023 https://doi.org/10.1145/3593013.3594094 Citation Details
He, Tianxing and Zhang, Jingyu and Wang, Tianle and Kumar, Sachin and Cho, Kyunghyun and Glass, James and Tsvetkov, Yulia "On the Blind Spots of Model-Based Evaluation Metrics for Text Generation" ACL: Annual Meeting of the Association for Computational Linguistics , 2023 Citation Details
Jegadeesan, Monisha and Kumar, Sachin and Wieting, John and Tsvetkov, Yulia "Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs" Proceedings of the 1st Workshop on Multilingual Representation Learning , 2021 https://doi.org/10.18653/v1/2021.mrl-1.15 Citation Details
Kumar, Sachin and Anastasopoulos, Antonios and Wintner, Shuly and Tsvetkov, Yulia "Machine Translation into Low-resource Language Varieties" Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) , 2021 https://doi.org/10.18653/v1/2021.acl-short.16 Citation Details
(Showing: 1 - 10 of 24)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Advances in natural language processing (NLP) technology now make it possible to perform many tasks through natural language or over natural language data -- automatic systems can answer questions, perform web search, or command our computers to perform specific tasks. However, ``language'' is not monolithic; people vary in the language they speak, the dialect they use, the relative ease with which they produce language, or the words they choose with which to express themselves. In benchmarking of NLP systems however, this linguistic variety is generally unattested. Most commonly tasks are formulated using canonical American English, designed with little regard for whether systems will work on language of any other variety. In this work we asked a simple question: can we measure the extent to which the diversity of language that we use affects the quality of results that we can expect from language technology systems? This will allow for the development and deployment of fair accuracy measures for a variety of tasks regarding language technology, encouraging advances in the state of the art in these technologies to focus on all, not just a select few.


Specifically, this work focused on several aspects of this overall research question:

  • First, the project developed a general-purpose methodology for quantifying how well particular language technologies work across many varieties of language. It built benchmarks that considered the overall speaking population for various language varieties and built a benchmark GlobalBench.
  • Second, it moved beyond simple accuracy measures, and directly quantified the effect that the accuracy of systems has on users in terms of relative utility derived from using the system. For instance, it examined the disparate effect of mistakes in voice recognition on people of various demographics.
  • Third, it examined novel methods for robust learning of NLP systems across language or dialectal boundaries, and examine the effect that these methods have on increasing accuracy for all users. For instance, it created techniques that can generate text in many different language varieties.

Last Modified: 02/02/2024
Modified by: Graham Neubig

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page