Award Abstract # 1409287
III: Medium: Collaborative Research: Closing the User-Model Loop for Understanding Topics in Large Document Collections

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF MARYLAND, COLLEGE PARK
Initial Amendment Date: July 30, 2014
Latest Amendment Date: February 27, 2018
Award Number: 1409287
Award Instrument: Continuing Grant
Program Manager: Hector Munoz-Avila
hmunoz@nsf.gov
 (703)292-4481
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 1, 2014
End Date: July 31, 2020 (Estimated)
Total Intended Award Amount: $650,000.00
Total Awarded Amount to Date: $650,000.00
Funds Obligated to Date: FY 2014 = $168,398.00
FY 2015 = $156,300.00

FY 2016 = $160,492.00

FY 2017 = $164,810.00
History of Investigator:
  • Jordan Boyd-Graber (Principal Investigator)
    jbg@umiacs.umd.edu
  • Leah Findlater (Former Principal Investigator)
  • Leah Findlater (Former Co-Principal Investigator)
Recipient Sponsored Research Office: University of Maryland, College Park
3112 LEE BUILDING
COLLEGE PARK
MD  US  20742-5100
(301)405-6269
Sponsor Congressional District: 04
Primary Place of Performance: University of Maryland College Park
MD  US  20742-5141
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): NPU8ULVAAS23
Parent UEI: NPU8ULVAAS23
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01001415DB NSF RESEARCH & RELATED ACTIVIT
01001516DB NSF RESEARCH & RELATED ACTIVIT

01001617DB NSF RESEARCH & RELATED ACTIVIT

01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7364, 7924
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Individuals and organizations must cope with massive amounts of unstructured text information: individuals sifting through a lifetime of e-mail and documents, journalists understanding the activities of government organizations, companies reacting to what people say about them online, or scholars making sense of digitized documents from the ancient world. This project's research goal is to bring together two previously disconnected components of how users understand this deluge of data: algorithms to sift through the data and interfaces to communicate the results of the algorithms. This project will allow users to provide feedback to algorithms that were typically employed on a "take it or leave it" basis: if the algorithm makes a mistake or misunderstands the data, users can correct the problem using an intuitive user interface and improve the underlying analysis. This project will jointly improve both the algorithms and the interfaces, leading to deeper understanding of, faster introduction to, and greater trust in the algorithms we rely on to understand massive textual datasets. The resulting source code and functional demos will be broadly disseminated, and tutorials will be shared online and in person in educational efforts and to aid the adoption of the methodologies.

This project enables computer algorithms and humans to apply their respective strengths and collaborate in managing and making sense of large volumes of textual data. It "closes the loop" in novel ways to connect users with a class of big data analysis algorithms called topic models. This connection is made through interfaces that empower the user to change the underlying models by refining the number and granularity of topics, adding or removing words considered by the model, and adding constraints on what words appear together in topics. The underlying model also enables new visualizations in the form of a Metadata Map that uses active learning to focus users' limited attention on the most important documents in a collection. Users annotate documents with useful meta-data and thereby further improve the quality of the discovered topics. The project includes evaluations of these methods through careful user studies and in-depth case studies to demonstrate that topics are more coherent, users can more quickly provide annotations, users trust the underlying algorithms more, and users can more effectively build an understanding of their textual data. The project web site (http://nlp.cs.byu.edu/closing-the-loop) will include pointers to the project Git repositories for source code, project demos, tutorials, and publications communicating experimental results.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 13)
Alison Smith and Tak Yeon Lee and Forough Poursabzi-Sangdeh and Jordan Boyd-Graber and Kevin Seppi and Niklas Elmqvist and Leah Findlater "Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels" Transactions of the Association for Computational Linguistics , v.5 , 2017 , p.1--15
Alison Smith and Tak Yeon Lee and Forough Poursabzi-Sangdeh and Jordan Boyd-Graber and Niklas Elmqvist and Leah Findlater "Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels" Transactions of the Association for Computational Linguistics , v.5 , 2017 , p.1--15
Eric Wallace and Pedro Rodriguez and Shi Feng and Ikuya Yamada and Jordan Boyd-Graber "Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples" Transactions of the Association of Computational Linguistics , v.10 , 2019
Eric Wallace and Pedro Rodriguez and Shi Feng and Ikuya Yamada and Jordan Boyd-Graber "Trick Me If You Can: Human-in-the-loop Generation of Adversarial Question Answering Examples" Transactions of the Association of Computational Linguistics , v.9 , 2019
Jordan Boyd-Graber "Humans and Computers Working Together to Measure Machine Learning Interpretability" The Bridge , v.47 , 2017 , p.6--10
Jordan Boyd-Graber, Fenfei Guo "Which Evaluations Uncover Sense Representations that Actually Make Sense?" Proceedings of the 12th Language Resources and Evaluation Conference , 2020 https://doi.org/ Citation Details
Kumar, Varun and Smith-Renner, Alison and Findlater, Leah and Seppi, Kevin and Boyd-Graber, Jordan "Why Didnt You Listen to Me? Comparing User Control of Human-in-the-Loop Topic Models" Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , 2019 https://doi.org/10.18653/v1/P19-1637 Citation Details
Lund, Jeffrey and Armstrong, Piper and Fearn, Wilson and Cowley, Stephen and Byun, Courtni and Boyd-Graber, Jordan and Seppi, Kevin "Automatic Evaluation of Local Topic Quality" Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , 2019 https://doi.org/10.18653/v1/P19-1076 Citation Details
Smith-Renner, Alison and Fan, Ron and Birchfield, Melissa and Wu, Tongshuang and Boyd-Graber, Jordan and Weld, Daniel S. and Findlater, Leah "No Explainability without Accountability: An Empirical Study of Explanations and Feedback in Interactive ML" CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems , 2020 https://doi.org/10.1145/3313831.3376624 Citation Details
Smith-Renner, Alison and Kumar, Varun and Boyd-Graber, Jordan and Seppi, Kevin and Findlater, Leah "Digging into user control: perceptions of adherence and instability in transparent models" IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces , 2020 https://doi.org/10.1145/3377325.3377491 Citation Details
Tak Yeon Lee and Alison Smith and Kevin Seppi and Niklas Elmqvist and Jordan Boyd-Graber and Leah Findlater "The Human Touch: How Non-expert Users Perceive, Interpret, and Fix Topic Models" International Journal of Human-Computer Studies , 2017
(Showing: 1 - 10 of 13)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Machine learning is revolutionizing relationships, businesses, and academia.  But the advanced techniques pushed by researchers are useless if people cannot use them.  This project investigated how to “close the loop” to create algorithms that meet users’ needs and to create systems to bring users and algorithms together to understand and productively analyze large text datasets.


This project formalized ways for users to correct automatic clusterings of documents called “topic models”: given a large collection of text, these algorithms create an automatic summary of the primary themes in the collection.  Through the project, we developed a new understanding of interactive topic models: using spectral methods to make them faster and decrease latency and to apply these insights to other forms of user information such as crowdsourced labels.


But these algorithms aren’t the end of the story: how do people actually use them?  To address that question, the project created user studies that examined which automatically created clusters of documents were most useful for users, how to evaluate that utility, and what users want from machine learning tools.  Users want explanations from imperfect machine learning algorithms, and they want algorithms to surprise them, surfacing unexpected information, but  not too often.  


Research papers from this grant received best paper awards or nominations at CoNLL 2015 and IUI 2018.


 


Last Modified: 01/31/2021
Modified by: Jordan L Boyd-Graber

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page