
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | June 26, 2017 |
Latest Amendment Date: | June 26, 2017 |
Award Number: | 1718549 |
Award Instrument: | Standard Grant |
Program Manager: |
A. Funda Ergun
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2017 |
End Date: | August 31, 2021 (Estimated) |
Total Intended Award Amount: | $450,000.00 |
Total Awarded Amount to Date: | $450,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1033 MASSACHUSETTS AVE STE 3 CAMBRIDGE MA US 02138-5366 (617)495-5501 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
33 Oxford St. Cambridge MA US 02138-5366 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Algorithmic Foundations |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The goal of this research project is to develop new results in machine learning and optimization when training data for machine learning or information about optimization problems is acquired from strategic sources. We are blessed with unprecedented abilities to connect with people all over the world: buying and selling products, sharing information and experiences, asking and answering questions, collaborating on projects, borrowing and lending money, and exchanging excess resources. These activities result in rich data that scientists can use to understand human social behavior, generate accurate predictions, find cures for diseases, and make policy recommendations. Machine learning and optimization traditionally take such data as given, for example treating them as independent samples drawn from some unknown probability distribution. However, such data are possessed or generated by people in the context of specific rules of interaction. Hence, what data become available and the quality of available data are results of strategic decisions. For example, people with sensitive medical conditions may be less willing to reveal their medical data in a survey and freelance workers may not put in a good-faith effort in completing a task. This strategic aspect of data challenges fundamental assumptions in machine learning and optimization. The research project takes a holistic view that jointly considers data acquisition with learning and optimization. It will bring improved benefits in business, government, and societal decision-making processes where machine learning and optimization are widely applicable. The research project also involves the mentoring of PhD students, innovation in graduate teaching, and engagement of members of underrepresented groups in research.
The PI will pursue a broad research agenda developing a fundamental understanding of how acquiring data from strategic sources affects the objectives of machine learning and optimization. The first set of goals aims to develop a theory for machine learning when a learning algorithm needs to purchase data from data holders who cannot fabricate their data but each have a private cost associated with revealing their data. A notion of economic efficiency for machine learning will be established. The second set of goals will further advance the frontier of machine learning by designing joint elicitation and learning mechanisms when data are acquired from strategic agents but the quality of the contributed data cannot be directly verified. The third set of goals will develop optimization algorithms with good theoretical guarantees when parameters of an optimization problem may be unknown initially but the algorithm designer can gather information about the parameters from strategic agents.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project has advanced our understanding of machine learning and optimization when data are possessed by people who may act strategically. The project has developed the following sets of results.
A first set of results focuses on joint elicitation and computation. For a computational task, for example statistical estimation or linear optimization, when data are possessed by strategic agents who have different levels of willingness to provide their data and there is a total budget for acquiring data, the data acquisition and the associated data pricing need to be considered in the context of the computational goal. Biases in data due to the acquisition process need to be corrected in the subsequent computational analysis. This project develops data acquisition schemes that are optimal for the objective of the computational task. It views the elicitation and computation as an integrated system and optimizes for the system goal.
A second set of results characterizes and develops learning guarantees when data come from strategic sources. The project characterizes linear regression algorithms that are strategyproof. That is, no one has incentives to lie about their data when these algorithms are used. Moreover, the project develops online learning algorithms that have good performance guarantees when in each round agents strategically respond to the algorithmic decision in that round.
A third set of results concerns elicitation. In many settings, when eliciting information and data from people, there is not direct verification of the quality of their contributions. This project develops mechanisms that guarantee truthful elicitation. More importantly, the mechanisms also provide a way to estimate the quality of the elicited information. The quality estimation is used to develop a method for aggregating individual predictions into an aggregated prediction. The aggregation method has robust accuracy across 14 real-world datasets.
In addition, this project touches on the economic side of pricing information. It develops mechanisms for selling private information for profits.
This project has also supported the training of a number of Ph.D. students and provided opportunities for undergraduate and postdoctoral research.
Last Modified: 12/30/2021
Modified by: Yiling Chen
Please report errors in award information by writing to: awardsearch@nsf.gov.