
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | March 30, 2015 |
Latest Amendment Date: | September 3, 2019 |
Award Number: | 1452961 |
Award Instrument: | Continuing Grant |
Program Manager: |
Tracy Kimbrel
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | May 1, 2015 |
End Date: | April 30, 2021 (Estimated) |
Total Intended Award Amount: | $514,962.00 |
Total Awarded Amount to Date: | $514,962.00 |
Funds Obligated to Date: |
FY 2016 = $105,768.00 FY 2017 = $105,358.00 FY 2018 = $103,105.00 FY 2019 = $102,614.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
1033 MASSACHUSETTS AVE STE 3 CAMBRIDGE MA US 02138-5366 (617)495-5501 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
33 Oxford Street, 114 MD Cambridge MA US 02138-2933 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Algorithmic Foundations |
Primary Program Source: |
01001617DB NSF RESEARCH & RELATED ACTIVIT 01001718DB NSF RESEARCH & RELATED ACTIVIT 01001819DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
In the past several years, there has been a great deal of exposure to the opportunities and promise that lie in large-scale data. Although massive data sets have been collected and analyzed in well over a decade, the excitement is largely due to the relatively recent availability of social data: massive digital records of human interactions. This provides a unique system-wide perspective of collective human behavior which poses fundamental challenges and opportunities. Despite the tremendous progress made in recent years, very few algorithmic frameworks to-date have been purposefully developed for analyzing social data sets. The goal of this project is to develop frameworks that enable analysis of large-scale social data. This project seeks novel models that are rich in problems, raise deep questions about computation, and can lead to long-lasting impact on sociology and data science.
From a technical perspective, the goal of the project is to develop appropriate algorithmic machinery with strong theoretical guarantees that translate to results in practice. The project consists of three main lines of research. The first line of research seeks to develop a theory to optimize events in the future given a distribution on the consequences of actions we take in the present. The second line of research considers learnability and scalability of social data, and its interpretation for optimization. The third line of research considers design of robust optimization algorithms for noisy data. The methodology includes experimentation on real data sets to develop appropriate algorithmic machinery with strong theoretical guarantees that translate to results in practice. Both undergraduate and graduate curriculum will benefit from the development of courses in this interdisciplinary area.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The goal of this project was to develop algorithmic and statistical methods to analyze large-scale complex data. In particular, the project considers data that is formed from social networks.
The project focuses on two main thrusts: algorithmic optimization of information spread in social networks and statistical methods for predicting information spread in social networks.
For the first problem of algorithmic optimization of information spread, the main goal is to select a set of individuals who are optimal for spreading information in a social network. This problem has been heavily studied in literature but in practice it has been in effective.
This project establishes a novel framework for information spread. This framework is one of a novel multistage optimization method developed in this project. The motivation for multistage optimization stems from a phenomenon studied in sociology known as “The Friendship Paradox”. This phenomenon is colloquially referred to as “your Friends have more friends than you” and in essence suggests that people are likely to be connected to influential individuals and therefore can be used to accelerate dissemination of information. The project provides strong mathematical foundations for this phenomenon, and develops an entire framework for optimization of influence in networks in a multistage manner that can leverage the friendship paradox phenomenon.
The other line of research in this project focuses on developing statistical methods for predicting spread of information in social networks. The goal here is to develop machine learning techniques that can take observations of social interactions and provide predictions on the likelihood of individuals to spread information. This project establishes theoretical bounds and proposes novel techniques for making such predictions.
The main challenge in developing techniques for predicting spread of information is that they are information theoretically infeasible. In particular, the amounts of data required to make reasonable predictions are too large for the existing methods to provide reasonable results. This project develops new methods using principles in machine learning theory of restricted hypothesis classes. In doing so, the method makes an assumption about the structure of existing techniques and validates this assumption on real data collected from the Facebook social network. It then develops algorithms for learning under such assumptions and provides better predictions of networked interactions as well as robust optimization of allocation in networked systems.
Last Modified: 09/24/2021
Modified by: Yaron Singer
Please report errors in award information by writing to: awardsearch@nsf.gov.