
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | September 4, 2020 |
Latest Amendment Date: | October 19, 2020 |
Award Number: | 2007941 |
Award Instrument: | Standard Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2020 |
End Date: | September 30, 2024 (Estimated) |
Total Intended Award Amount: | $249,998.00 |
Total Awarded Amount to Date: | $249,998.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1350 BEARDSHEAR HALL AMES IA US 50011-2103 (515)294-5225 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
1138 Pearson Ames IA US 50011-2207 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Data are abundantly available to encode knowledge in many domains, such as biomedical research, online commerce, open government, education, and public health. Machine learning is a powerful tool to discover novel knowledge from data and to help individuals and organizations make informed decisions. However, machine learning needs to be bootstrapped by human-annotated knowledge, which can be expensive to obtain and also contain human errors. The team of researchers discovers and exploits the dependencies in the data, via novel methodologies to significantly reduce the cost and noises when providing critical knowledge for machine learning. The research outputs, including algorithms, systems, and theories, are sufficiently generic to benefit many domains where machine learning is applicable. By conducting the fundamental research, the team will train undergraduates and graduates for the STEM workforce in the nation.
The researchers will collaborate to develop algorithms, systems, and theories for reducing costs and noises when annotating dependent data, termed as ?structured annotations?, to provide supervision knowledge for machine learning. While the dependencies can make data annotations costly and error-prone, the researchers view the dependencies as a useful inductive bias for selective and accurate annotations. In particular, the research team proposes a human-in-the-loop system to aid the construction of proper probabilistic graphical models to encode the dependencies. The project team combines contextual and multi-armed bandits with scalable graph inference algorithms to reduce labeling costs. Based on the graphical bandits, the team addresses the budget allocation when querying labels of the same data point repetitively for robustness. With noisy human annotations, the team formulates optimization problems and algorithms to jointly infer the annotator competences and the ground truth labels of the data. From the theoretical perspective, the project will advance the active learning in crowdsourcing settings with more realistic noise distributions and will analyze the regrets in structured annotations. The project will result in datasets, algorithms, and a testbed system that benefit not only the core machine learning research community but also many domains that use machine learning.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Supervised learning, such as deep networks, are trained on a large number of high quality labeled data. Obtaining such training data is a bottleneck to the accuracy of supervised learning. Structured predictions with complex (multi-layered, multi-typed, cross-modal, hierarchical, etc.) dependencies are frequently found helpful for natural language processing, computer vision, fraud detection, and graph data mining, while current crowdsourcing research is lagging behind as it only deals with much simpler dependencies. It is unknown to what extent will the dependencies hinder or help data labeling, how to design system, methodologies, and algorithms to crowdsource dependent labels and infer the true labels from the crowdsourced labels, and provide the more fundamental theoretical limit/guarantee in crowdsourcing efficacy.
In this project, PI investigated the “structured annotation” problem that address the complex dependencies in crowdsourced labeling tasks, and developed new methodologies and algorithms with theoretical guarantees for (1) selectively finding and querying data items with a fixed budget to improve annoation accuracy and (2) robust crowdsourcing ground truth inference with limited annotations, leveraging the complex data dependencies. The project lead to a set of efficient and effective methods, technologies and software systems for building reliable and sufficient training data for machine learning tasks in various domains. This project included training four PhD students. The project has resulted in over 10 publications at top conferences. Research results have been integrated into the PI’s courses and presented in the Midwest Big Data Summer School, Data Science Workshop, and other outreach activities.
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Last Modified: 01/29/2025
Modified by: Qi Li
Please report errors in award information by writing to: awardsearch@nsf.gov.