NSF Award Search: Award # 1619458

Award Abstract # 1619458

III: Small: Robustness in Social Network Analysis: Models, Inference, and Algorithms

NSF Org:	IIS Division of Information & Intelligent Systems
Recipient:	UNIVERSITY OF SOUTHERN CALIFORNIA
Initial Amendment Date:	June 28, 2016
Latest Amendment Date:	June 28, 2016
Award Number:	1619458
Award Instrument:	Standard Grant
Program Manager:	Sylvia Spengler sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	September 1, 2016
End Date:	August 31, 2020 (Estimated)
Total Intended Award Amount:	$507,996.00
Total Awarded Amount to Date:	$507,996.00
Funds Obligated to Date:	FY 2016 = $507,996.00
History of Investigator:	David Kempe (Principal Investigator) David.M.Kempe@gmail.com Yan Liu (Co-Principal Investigator)
Recipient Sponsored Research Office:	University of Southern California 3720 S FLOWER ST FL 3 LOS ANGELES CA US 90033 (213)740-7762
Sponsor Congressional District:	34
Primary Place of Performance:	University of Southern California 3720 S. Flower Street Los Angeles CA US 90089-0001
Primary Place of Performance Congressional District:	37
Unique Entity Identifier (UEI):	G88KLJR3KYT5
Parent UEI:
NSF Program(s):	Info Integration & Informatics
Primary Program Source:	01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7364, 7923
Program Element Code(s):	736400
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

The burgeoning field of "Social Network Analysis" focuses on extracting useful insights from such social network data. Implemented or envisioned applications range from learning about the nature and driving forces behind human interactions, to targeted product or activity recommendations and even homeland security. Contrary to other networks, such as transportation or computer networks, massive uncertainty and noise are practically always associated with social network data: data pertaining to individuals are often not observable, or are observed incorrectly. The primary goal of this project is to understand the risks and implications of such noisy data, and to design network analysis algorithms that are significantly more robust to noise and missing data. Given the importance that mathematical models play in social networks analysis, a closely related thread of the project is to analyze the fit between typical social network models and real-world data, in particular regarding high-level connectivity properties. The project website will be used to disseminate research prototypes and data that are collected as part of the project.

Specifically, three connected research thrusts that integrate the PIs' expertise in machine learning and theoretical computer science will be explored: (1) How well do standard random graph models fit real-world social network data, in particular with regard to expansion and spectral properties? Since the answer likely is "poorly," how well do modifications based on requiring local or global structure remedy this problem? (2) What is the impact of missing observations of diffusion or activation processes on the inferred social networks when learning from some contagious behavior? How can this impact be mitigated by algorithms that take the possibility of missing data into account? (3) If social network data are observed with significant (and possibly non-random) noise, under what conditions can stability of an algorithmic output be ensured? How "obvious" does the right answer have to be to not get obscured by noise in the data? Can "obvious" answers be found more efficiently? The proposed research has the potential to impact the way in which social network inference and optimization are addressed. The PIs are committed to a suite of activities, among them inclusion of undergraduate students in the proposed research and outreach to local high school students, for broader impacts.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Feng Qian, Chengyue Gong, Karishma Sharma, and Yan Liu "Neural User Response Generator: Fake News Detection with Collective User Intelligence" IJCAI , 2018

Karishma Sharma, Feng Qian, He Jiang, Natali Ruchansky, Ming Zhang, Yan Liu "Combating Fake News: A Survey on Identification and Mitigation Techniques" ACM Transactions on Intelligent Systems and Technology (TIST) , v.10 , 2019 , p.21:1

Palash Goyal, Nitin Kamra, Xinran He and Yan Liu. "DynGEM: Deep Embedding Method for Dynamic Graphs." 3rd Representation Learning for Graphs Workshop (ReLiG 2017) with IJCAI'17 , 2017

Xinran HeDavid Kempe "Stability and Robustness in Influence Maximization." ACM Transactions on Knowledge Discovery from Data (TKDD) , v.12 , 2018 , p.66:1

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Outcomes Report:

The high-level goal of the grant was to understand both the theoretical foundations and applications of information dissemination processes on networks. Particular interest was devoted to dealing with missing or uncertain information. More specifically, the project team made the following contributions.

1. At a fundamental level, the team proposed and investigated in depth network generative models with a focus on global (as opposed to local) network properties. Many network generative models focus on matching local features (such as node degrees or small motifs); yet, most network processes of interest, such as spreads of diseases or information, are much more closely characterized by global properties such as the partition of a network into sparsely connected communities. The team's work proposed different sophisticated models which explicitly aim to match the high-level connectivity structure of a given input graph. These models are based on techniques such as non-convex optimization, linear programming, deep generative neural networks, random walks, and local search techniques. Comprehensive experimental evaluation shows that indeed, these models produce networks that match an input network not only in the network's spectrum (the explicit optimization goal) but also in various other related quantities.

2. The team carried out a comprehensive study of algorithmic techniques for inference and optimization of network influence when input data are missing or uncertain. These techniques were based on both more theoretical algorithms (such as PAC learning algorithms of network influence parameters) and more practical heuristics. In order to model such processes accurately, the team formulated various novel influence processes, including mutually influencing point processes and mixtures of cascade models. These models were shown to often result in better fits with real-world data than prior models.

3. Building on the comprehensive study of network influence processes, the team carried out an investigation of the dissemination of misinformation from a network point of view. In particular, the team exhibited significant differences in the patterns and timings of retweets and posts between real news and coordinated disinformation campaigns. These analyses both drew on and informed the models based on mixed influence processes. Experimental evaluation confirmed that using such a network-based approach, it is possible to identify coordinated misinformation campaigns using labeled data or other human annotation. This line of work may have significant applications beyond the immediate scientific interest.

4. In particular, the team leveraged the techniques described above to carry out the first large-scale analysis of coordinated disinformation campaigns about the Covid-19 pandemic. Building on the network models, the team developed a dashboard that allows visualization of coordinated campaigns by the public and by decisionmakers. This dashboard has the potential to reduce the impact of disinformation campaigns, and thereby to lead to significantly more informed decision making.

5. Finally, the team studied how to deal with missing information in learning settings more fundamentally. The team focus on a model of interactive learning, in which an algorithm must repeatedly choose a combinatorial structure, and learns about its mistakes. This setting naturally models learning of classifiers, permutations, or stable matchings, but also the inference of latent network structures by trial and error. The power of the general framework is that it allows the incorporation of incorrect responses as well as dynamic changes in the ground truth while learning happens. Rather than treating each application separately, the framework provides a universal algorithmic approach when can be easily customized.

Last Modified: 11/07/2020
Modified by: David M Kempe

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error