Award Abstract # 1842183
EAGER: Assessing Influence of News Articles on Emerging Events

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: TEMPLE UNIVERSITY-OF THE COMMONWEALTH SYSTEM OF HIGHER EDUCATION
Initial Amendment Date: August 9, 2018
Latest Amendment Date: August 9, 2018
Award Number: 1842183
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2018
End Date: August 31, 2021 (Estimated)
Total Intended Award Amount: $300,000.00
Total Awarded Amount to Date: $300,000.00
Funds Obligated to Date: FY 2018 = $300,000.00
History of Investigator:
  • Zoran Obradovic (Principal Investigator)
    zoran.obradovic@temple.edu
  • Eduard Dragut (Co-Principal Investigator)
Recipient Sponsored Research Office: Temple University
1805 N BROAD ST
PHILADELPHIA
PA  US  19122-6104
(215)707-7547
Sponsor Congressional District: 02
Primary Place of Performance: Temple University
Temple University. 1925 N. 12th
Philadelphia
PA  US  19122-1801
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): QD4MGHFDJKU1
Parent UEI: QD4MGHFDJKU1
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7364, 7916
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

There is growing interest in mining social media streams for early detection of (important) events, like crisis detection (and response) and predicting social unrest. Social media and news articles play an important role in documenting daily societal events. News outlets host social media platforms that facilitate users to engage in debating daily news topics. For example, the social networks at NY Times, The Guardian, and Washington Post have more than 130,000 users each. Together, they constitute a considerable segment of the varied opinions of society at large. The objective of this project is to assess the feasibility of leveraging the trend of past social response to news articles observed over a few hundred social media streams to detect the emergence of new important social, economic, and political events. The project benefits multiple segments of society, such as social scientists and policy makers, because the results of the proposed project provide tools to predict important real-life events using indicators observed on social media. The educational component of the project includes the involvement of graduate and undergraduate students' training and research and the incorporation of research projects and results in appropriate courses.

The difficult and high risk problem addressed in this project is that of transforming the streams of social media chatter at hundreds of news outlets into data signals from which to mine those signals foretelling the imminence of an (important) event, and to develop sound predictive analytics on top of those signals.
This project seeks creating a proof of concept that works with a few hundred social communities from news outlets. Specific aims consist of (i) developing methods for automatic data collection and (ii) efficient predictive modeling at that scale. The results (e.g., software tools) are made available to benefit researchers in academia and industry. Free, open-source software for implementing the developed techniques will be distributed to enhance existing research infrastructure.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 15)
Alshehri, J. "Stay on Topic, Please: Aligning User Comments to the Content of a News Article" Annual BCS-IRSG European Conference on Information Retrieval , 2021 https://doi.org/10.1007/978-3-030-72113-8_1 Citation Details
Cao, Xi Hang and Han, Chao M. and Glass, Lucas and Kindman, Allen and Obradovic, Zoran "Time-to-event estimation by re-defining time" Journal of Biomedical Informatics , v.100 , 2019 10.1016/j.jbi.2019.103326 Citation Details
Gligorijevic, Jelena and Gligorijevic, Djordje and Stojkovic, Ivan and Bai, Xiao and Goyal, Amit and Obradovic, Zoran "Deeply supervised model for click-through rate prediction in sponsored search" Data Mining and Knowledge Discovery , v.33 , 2019 10.1007/s10618-019-00625-3 Citation Details
Han, C. "A Distributable Convex Approach for Graph Structure Discovery" 15th International Workshop on Mining and Learning with Graphs (MLG) , 2019 Citation Details
Han, C. "Temporal Graph Regression via Structure-Aware Intrinsic Representation Learning" Proc. 19th SIAM Int?l Conf. Data Mining , 2019 Citation Details
He, Lihong and Han, Chao and Mukherjee, Arjun and Obradovic, Zoran and Dragut, Eduard "On the dynamics of user engagement in news comment media" WIREs Data Mining and Knowledge Discovery , v.10 , 2019 https://doi.org/10.1002/widm.1342 Citation Details
Li, Xiaoyang and Pavlovski, Martin and Zhou, Fang and Dong, Qiwen and Qian, Weining and Obradovic, Zoran "Supervised Multi-view Latent Space Learning by Jointly Preserving Similarities across Views and Samples" International Conference on Database Systems for Advanced Applications , 2022 https://doi.org/10.1007/978-3-031-00126-0_53 Citation Details
Pavlovski, M. "Time-Aware User Embeddings as a Service" KDD , 2020 https://doi.org/10.1145/3394486.3403371 Citation Details
Pavlovski, Martin and Gligorijevic, Jelena and Stojkovic, Ivan and Agrawal, Shubham and Komirishetty, Shabhareesh and Gligorijevic, Djordje and Bhamidipati, Narayan and Obradovic, Zoran "Time-Aware User Embeddings as a Service" ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , 2020 Citation Details
Polychronopoulou, Athanasia and Alshehri, Jumanah and Obradovic, Zoran "Distinguishability of graphs: a case for quantum-inspired measures" SONAM '21: Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , 2021 https://doi.org/10.1145/3487351.3488330 Citation Details
Polychronopoulou, Athanasia and Zhou, Fang and Obradovic, Zoran "Cosine similarity for multiplex network summarization" ASONAM '21: Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , 2021 https://doi.org/10.1145/3487351.3488331 Citation Details
(Showing: 1 - 10 of 15)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The objective of this study is to assess the feasibility of leveraging the trend of past social response to news articles observed over multiple social media streams to detect the emergence of social, economic, and political events. This project seeks creating a proof of concept that works with multiple social communities from news outlets. Specific aims consist of (i) developing methods for automatic data collection and (ii) efficient predictive modeling at that scale.

Ten Ph.D., one M.S., and one undergraduate student were trained on this project. Five of the students were female students. The project has produced 15 peer-reviewed publications and one book chapter. We give below the main outcomes reported in those publications:

A method is developed that learns jointly article-comment embeddings and infers the relevance class of comments. This method introduces an ordinal classification loss that penalizes the difference between the predicted and true labels. A thorough study characterized influence of the proposed loss on the learning process.

A prediction algorithm is developed to capture user comment volume for a news article. This addressed a limitation of the previous algorithms that did not consider user-to-user commenting activity.

An effective meta-framework is established for high imbalance overlapped classification, called DAPS (DynAmic self-Paced sampling enSemble).  The new framework leverages reasonable and effective sampling to maximize the utilization of informative instances and to avoid serious information loss and assigns proper instance weights to address the issues of noisy data. The main benefit of the proposed framework is that most of the existing canonical classifiers (e.g. Decision Tree, Random Forest) can be integrated in DAPS.

A novel approach called MELTS (Multi-viEw LatenT space learning with Similarity preservation) is introduced for multi-view classification. MELTS first utilizes distance correlation to explore hidden between-view relationships. The method leverages both the similarity information of different view pairs and the label information of distinct sample pairs, to learn a latent representation among multiple views. The experimental results on both synthetic and real-world datasets demonstrate that MELTS considerably improves classification accuracy compared to alternative methods.

Effective measures for graph similarity were introduced inspired by distances on a set of quantum states. These measures can effectively distinguish graphs, and can be used with both weighted and unweighted networks, while identifying graph structure changes, such as the introduction of disconnected components. The proposed measures intuitively capture several structural characteristics, that are often used to describe and compare networks, providing a holistic approach. Two additional important features distinguish these methods from previously published approaches: they are well-established mathematical methods that incorporate the intrinsic structure of the entire network and have high interpretability.

A network summarization approach is proposed for weighted multiplex networks. This method focuses on removing structural redundancy while maintaining the information carried by the intrinsic structure of the graph. Using real-world data from different domains, the new method is shown to maintain more accurately the properties of the original graph and for a larger summarization percentage. The proposed method is shown to reduce the number of edges in the network faster than the baselines resulting in a more efficient summarization technique.

A novel time series classification model is developed which provides easy explanation and which extracts both informative shapelets and shapelet-orders and incorporates the shapelet-transformed space with shapelet-order space for time-series classification. The temporal dependencies among local discriminative patterns discovered by the proposed method were found to significantly increase the confidence of the prediction and further improves the classification performance. The results of extensive experiments conducted on 75 univariate and 6 multivariate real-world datasets provide evidence that the proposed model could significantly improve accuracy on average over considered state-of-the-art alternatives.

A book chapter is written on state-of-the-art natural language processing problems, algorithms, models, and libraries. This article provides practical examples of applications and is also used in the PI's data mining and information retrieval courses.


Last Modified: 04/27/2022
Modified by: Eduard Dragut

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page