Award Abstract # 1452961
CAREER: Algorithmic Foundations for Social Data

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: PRESIDENT AND FELLOWS OF HARVARD COLLEGE
Initial Amendment Date: March 30, 2015
Latest Amendment Date: September 3, 2019
Award Number: 1452961
Award Instrument: Continuing Grant
Program Manager: Tracy Kimbrel
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: May 1, 2015
End Date: April 30, 2021 (Estimated)
Total Intended Award Amount: $514,962.00
Total Awarded Amount to Date: $514,962.00
Funds Obligated to Date: FY 2015 = $98,117.00
FY 2016 = $105,768.00

FY 2017 = $105,358.00

FY 2018 = $103,105.00

FY 2019 = $102,614.00
History of Investigator:
  • Yaron Singer (Principal Investigator)
    yaron@seas.harvard.edu
Recipient Sponsored Research Office: Harvard University
1033 MASSACHUSETTS AVE STE 3
CAMBRIDGE
MA  US  02138-5366
(617)495-5501
Sponsor Congressional District: 05
Primary Place of Performance: Harvard SEAS
33 Oxford Street, 114 MD
Cambridge
MA  US  02138-2933
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): LN53LCFJFL45
Parent UEI:
NSF Program(s): Algorithmic Foundations
Primary Program Source: 01001516DB NSF RESEARCH & RELATED ACTIVIT
01001617DB NSF RESEARCH & RELATED ACTIVIT

01001718DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT

01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 7926, 7932
Program Element Code(s): 779600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

In the past several years, there has been a great deal of exposure to the opportunities and promise that lie in large-scale data. Although massive data sets have been collected and analyzed in well over a decade, the excitement is largely due to the relatively recent availability of social data: massive digital records of human interactions. This provides a unique system-wide perspective of collective human behavior which poses fundamental challenges and opportunities. Despite the tremendous progress made in recent years, very few algorithmic frameworks to-date have been purposefully developed for analyzing social data sets.  The goal of this project is to develop frameworks that enable analysis of large-scale social data.  This project seeks novel models that are rich in problems, raise deep questions about computation, and can lead to long-lasting impact on sociology and data science.
  
From a technical perspective, the goal of the project is to develop appropriate algorithmic machinery with strong theoretical guarantees that translate to results in practice.  The project consists of three main lines of research.  The first line of research seeks to develop a theory to optimize events in the future given a distribution on the consequences of actions we take in the present.  The second line of research considers learnability and scalability of social data, and its interpretation for optimization.  The third line of research considers design of robust optimization algorithms for noisy data.  The methodology includes experimentation on real data sets to develop appropriate algorithmic machinery with strong theoretical guarantees that translate to results in practice. Both undergraduate and graduate curriculum will benefit from the development of courses in this interdisciplinary area.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 35)
Thibaut Horel and Yaron Singer "Scalable Methods for Adaptively Seeding a Social Network" The International World Wide Web Conference (WWW) 2015 , 2015
Thibaut Horel, Yaron Singer "? Maximizing Approximately Submodular Functions" NIPS , 2016
Yaron Singer and Jan Vondrak "Information-theoretic Lower Bounds for Convex Optimization with Erroneous Oracles" NIPS 2015 , 2015
Ashwinkumar Badanidiyuru, Christos Papadimitriou, Aviad Rubinstein, Lior Seeman, Yaron Singer "? Locally Adaptive Optimization: Adaptive Seeding for Monotone Submodular Functions" SODA , 2016
Ashwinkumar Badanidiyuru, Christos Papadimitriou, Aviad Rubinstein, Lior Seeman, Yaron Singer "Locally Adaptive Optimization: Adaptive Seeding for Monotone Submodular Functions" SODA 2016 , 2016
Avinatan Hassidim and Yaron Singer "Optimization for Approximate Submodularity" Annual Conference on Neural Information Processing Systems (NIPS) 2018 , 2018
Avinatan Hassidim and Yaron Singer "Robust Guarantees of Stochastic Greedy Algorithms" ICML , 2017
Avinatan Hassidim and Yaron Singer "Submodular Optimization under Noise" COLT , 2017
Avinatan Hassidim, Ron Kupfer, Yaron Singer "An Optimal Elimination Algorithm for Learning a Best Arm" Annual Conference on Neural Information Processing Systems (NeurIPS) , 2020
Brendan Lucier, Joel Oren, Yaron Singer "Influence at Scale: Distributed Computation of Contagion in Networks" The ACM Conference on Knowledge Discovery and Data Mining (KDD) 2015 , 2015
Dimitris Kalimeris, Gal Kaplun, and Yaron Singer "Robust Influence Maximization in Hyperparametric Domains" International Conference of Machine Learning (ICML) 2019 , 2019
(Showing: 1 - 10 of 35)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The goal of this project was to develop algorithmic and statistical methods to analyze large-scale complex data. In particular, the project considers data that is formed from social networks.

 

The project focuses on two main thrusts: algorithmic optimization of information spread in social networks and statistical methods for predicting information spread in social networks.

For the first problem of algorithmic optimization of information spread, the main goal is to select a set of individuals who are optimal for spreading information in a social network. This problem has been heavily studied in literature but in practice it has been in effective. 

This project establishes a novel framework for information spread. This framework is one of a novel multistage optimization method developed in this project. The motivation for multistage optimization stems from a phenomenon studied in sociology known as “The Friendship Paradox”. This phenomenon is colloquially referred to as “your Friends have more friends than you” and in essence suggests that people are likely to be connected to influential individuals and therefore can be used to accelerate dissemination of information. The project provides strong mathematical foundations for this phenomenon, and develops an entire framework for optimization of influence in networks in a multistage manner that can leverage the friendship paradox phenomenon.

The other line of research in this project focuses on developing statistical methods for predicting spread of information in social networks. The goal here is to develop machine learning techniques that can take observations of social interactions and provide predictions on the likelihood of individuals to spread information. This project establishes theoretical bounds and proposes novel techniques for making such predictions.

The main challenge in developing techniques for predicting spread of information is that they are information theoretically infeasible. In particular, the amounts of data required to make reasonable predictions are too large for the existing methods to provide reasonable results. This project develops new methods using principles in machine learning theory of restricted hypothesis classes. In doing so, the method makes an assumption about the structure of existing techniques and validates this assumption on real data collected from the Facebook social network. It then develops algorithms for learning under such assumptions and provides better predictions of networked interactions as well as robust optimization of allocation in networked systems. 

 


Last Modified: 09/24/2021
Modified by: Yaron Singer

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page