Award Abstract # 1718549
AF: Small: Learning and Optimization with Strategic Data Sources

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: PRESIDENT AND FELLOWS OF HARVARD COLLEGE
Initial Amendment Date: June 26, 2017
Latest Amendment Date: June 26, 2017
Award Number: 1718549
Award Instrument: Standard Grant
Program Manager: A. Funda Ergun
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2017
End Date: August 31, 2021 (Estimated)
Total Intended Award Amount: $450,000.00
Total Awarded Amount to Date: $450,000.00
Funds Obligated to Date: FY 2017 = $450,000.00
History of Investigator:
  • Yiling Chen (Principal Investigator)
    yiling@seas.harvard.edu
Recipient Sponsored Research Office: Harvard University
1033 MASSACHUSETTS AVE STE 3
CAMBRIDGE
MA  US  02138-5366
(617)495-5501
Sponsor Congressional District: 05
Primary Place of Performance: President and Fellows of Harvard College
33 Oxford St.
Cambridge
MA  US  02138-5366
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): LN53LCFJFL45
Parent UEI:
NSF Program(s): Algorithmic Foundations
Primary Program Source: 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923, 7926, 7932
Program Element Code(s): 779600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The goal of this research project is to develop new results in machine learning and optimization when training data for machine learning or information about optimization problems is acquired from strategic sources. We are blessed with unprecedented abilities to connect with people all over the world: buying and selling products, sharing information and experiences, asking and answering questions, collaborating on projects, borrowing and lending money, and exchanging excess resources. These activities result in rich data that scientists can use to understand human social behavior, generate accurate predictions, find cures for diseases, and make policy recommendations. Machine learning and optimization traditionally take such data as given, for example treating them as independent samples drawn from some unknown probability distribution. However, such data are possessed or generated by people in the context of specific rules of interaction. Hence, what data become available and the quality of available data are results of strategic decisions. For example, people with sensitive medical conditions may be less willing to reveal their medical data in a survey and freelance workers may not put in a good-faith effort in completing a task. This strategic aspect of data challenges fundamental assumptions in machine learning and optimization. The research project takes a holistic view that jointly considers data acquisition with learning and optimization. It will bring improved benefits in business, government, and societal decision-making processes where machine learning and optimization are widely applicable. The research project also involves the mentoring of PhD students, innovation in graduate teaching, and engagement of members of underrepresented groups in research.

The PI will pursue a broad research agenda developing a fundamental understanding of how acquiring data from strategic sources affects the objectives of machine learning and optimization. The first set of goals aims to develop a theory for machine learning when a learning algorithm needs to purchase data from data holders who cannot fabricate their data but each have a private cost associated with revealing their data. A notion of economic efficiency for machine learning will be established. The second set of goals will further advance the frontier of machine learning by designing joint elicitation and learning mechanisms when data are acquired from strategic agents but the quality of the contributed data cannot be directly verified. The third set of goals will develop optimization algorithms with good theoretical guarantees when parameters of an optimization problem may be unknown initially but the algorithm designer can gather information about the parameters from strategic agents.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 11)
Chen, Yiling and Immorlica, Nicole and Lucier, Brendan and Syrgkanis, Vasilis and Ziani, Juba "Optimal Data Acquisition for Statistical Estimation" ACM Conference on Economics and Computation , 2018 10.1145/3219166.3219195 Citation Details
Chen, Yiling and Liu, Yang and Podimata, Chara "Learning Strategy-Aware Linear Classifiers" Proc. of the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020) , 2020 Citation Details
Chen, Yiling and Podimata, Chara and Procaccia, Ariel D. and Shah, Nisarg "Strategyproof Linear Regression in High Dimensions" Proceedings of the 2018 ACM Conference on Economics and Computation , 2018 Citation Details
Chen, Yiling and Shen, Yiheng and Zheng, Shuran. "Truthful Data Acquisition via Peer Prediction" Proc. of the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020) , 2020 Citation Details
Chen, Yiling and Zheng, Shuran "Prior-free Data Acquisition for Accurate Statistical Estimation" Proceedings of the 2019 ACM Conference on Economics and Computation , 2019 10.1145/3328526.3329564 Citation Details
Hu, Lily and Chen, Yiling "A Short-term Intervention for Long-term Fairness in the Labor Market" Proceedings of the 2018 World Wide Web Conference , 2018 10.1145/3178876.3186044 Citation Details
Liu, Yang and Wang, Juntao and Chen, Yiling "Surrogate Scoring Rules" ACM Conference on Economics and Computation , 2020 https://doi.org/10.1145/3391403.3399488 Citation Details
Wang, Juntao and Liu, Yang and Chen, Yiling "Forecast Aggregation via Peer Prediction" Proceedings of the Ninth AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2021) , 2021 Citation Details
Yiling Chen, Haifeng Xu "Selling Information Through Consulting" Proc. of ACM-SIAM Symposium on Discrete Algorithms (SODA), 2020. , 2020 Citation Details
Zheng, Shuran and Chen, Yiling "Optimal Advertising for Information Products" Proceedings of the 22nd ACM Conference on Economics and Computation (EC 2021) , 2021 https://doi.org/10.1145/3465456.3467649 Citation Details
Zheng, Shuran and Waggoner, Bo and Liu, Yang and Chen, Yiling "Active Information Acquisition for Linear Optimization" Uncertainty in artificial intelligence , 2018 Citation Details
(Showing: 1 - 10 of 11)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project has advanced our understanding of machine learning and optimization when data are possessed by people who may act strategically. The project has developed the following sets of results. 

A first set of results focuses on joint elicitation and computation. For a computational task, for example statistical estimation or linear optimization, when data are possessed by strategic agents who have different levels of willingness to provide their data and there is a total budget for acquiring data, the data acquisition and the associated data pricing need to be considered in the context of the computational goal. Biases in data due to the acquisition process need to be corrected in the subsequent computational analysis. This project develops data acquisition schemes that are optimal for the objective of the computational task. It views the elicitation and computation as an integrated system and optimizes for the system goal. 

A second set of results characterizes and develops learning guarantees when data come from strategic sources. The project characterizes linear regression algorithms that are strategyproof. That is, no one has incentives to lie about their data when these algorithms are used. Moreover, the project develops online learning algorithms that have good performance guarantees when in each round agents strategically respond to the algorithmic decision in that round. 

A third set of results concerns elicitation. In many settings, when eliciting information and data from people, there is not direct verification of the quality of their contributions. This project develops mechanisms that guarantee truthful elicitation. More importantly, the mechanisms also provide a way to estimate the quality of the elicited information. The quality estimation is used to develop a method for aggregating individual predictions into an aggregated prediction. The aggregation method has robust accuracy across 14 real-world datasets. 

In addition, this project touches on the economic side of pricing information. It develops mechanisms for selling private information for profits. 

This project has also supported the training of a number of Ph.D. students and provided opportunities for undergraduate and postdoctoral research.

 


Last Modified: 12/30/2021
Modified by: Yiling Chen

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page