
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | August 9, 2018 |
Latest Amendment Date: | August 9, 2018 |
Award Number: | 1842183 |
Award Instrument: | Standard Grant |
Program Manager: |
Sylvia Spengler
sspengle@nsf.gov (703)292-7347 IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2018 |
End Date: | August 31, 2021 (Estimated) |
Total Intended Award Amount: | $300,000.00 |
Total Awarded Amount to Date: | $300,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1805 N BROAD ST PHILADELPHIA PA US 19122-6104 (215)707-7547 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
Temple University. 1925 N. 12th Philadelphia PA US 19122-1801 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
There is growing interest in mining social media streams for early detection of (important) events, like crisis detection (and response) and predicting social unrest. Social media and news articles play an important role in documenting daily societal events. News outlets host social media platforms that facilitate users to engage in debating daily news topics. For example, the social networks at NY Times, The Guardian, and Washington Post have more than 130,000 users each. Together, they constitute a considerable segment of the varied opinions of society at large. The objective of this project is to assess the feasibility of leveraging the trend of past social response to news articles observed over a few hundred social media streams to detect the emergence of new important social, economic, and political events. The project benefits multiple segments of society, such as social scientists and policy makers, because the results of the proposed project provide tools to predict important real-life events using indicators observed on social media. The educational component of the project includes the involvement of graduate and undergraduate students' training and research and the incorporation of research projects and results in appropriate courses.
The difficult and high risk problem addressed in this project is that of transforming the streams of social media chatter at hundreds of news outlets into data signals from which to mine those signals foretelling the imminence of an (important) event, and to develop sound predictive analytics on top of those signals.
This project seeks creating a proof of concept that works with a few hundred social communities from news outlets. Specific aims consist of (i) developing methods for automatic data collection and (ii) efficient predictive modeling at that scale. The results (e.g., software tools) are made available to benefit researchers in academia and industry. Free, open-source software for implementing the developed techniques will be distributed to enhance existing research infrastructure.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The objective of this study is to assess the feasibility of leveraging the trend of past social response to news articles observed over multiple social media streams to detect the emergence of social, economic, and political events. This project seeks creating a proof of concept that works with multiple social communities from news outlets. Specific aims consist of (i) developing methods for automatic data collection and (ii) efficient predictive modeling at that scale.
Ten Ph.D., one M.S., and one undergraduate student were trained on this project. Five of the students were female students. The project has produced 15 peer-reviewed publications and one book chapter. We give below the main outcomes reported in those publications:
A method is developed that learns jointly article-comment embeddings and infers the relevance class of comments. This method introduces an ordinal classification loss that penalizes the difference between the predicted and true labels. A thorough study characterized influence of the proposed loss on the learning process.
A prediction algorithm is developed to capture user comment volume for a news article. This addressed a limitation of the previous algorithms that did not consider user-to-user commenting activity.
An effective meta-framework is established for high imbalance overlapped classification, called DAPS (DynAmic self-Paced sampling enSemble). The new framework leverages reasonable and effective sampling to maximize the utilization of informative instances and to avoid serious information loss and assigns proper instance weights to address the issues of noisy data. The main benefit of the proposed framework is that most of the existing canonical classifiers (e.g. Decision Tree, Random Forest) can be integrated in DAPS.
A novel approach called MELTS (Multi-viEw LatenT space learning with Similarity preservation) is introduced for multi-view classification. MELTS first utilizes distance correlation to explore hidden between-view relationships. The method leverages both the similarity information of different view pairs and the label information of distinct sample pairs, to learn a latent representation among multiple views. The experimental results on both synthetic and real-world datasets demonstrate that MELTS considerably improves classification accuracy compared to alternative methods.
Effective measures for graph similarity were introduced inspired by distances on a set of quantum states. These measures can effectively distinguish graphs, and can be used with both weighted and unweighted networks, while identifying graph structure changes, such as the introduction of disconnected components. The proposed measures intuitively capture several structural characteristics, that are often used to describe and compare networks, providing a holistic approach. Two additional important features distinguish these methods from previously published approaches: they are well-established mathematical methods that incorporate the intrinsic structure of the entire network and have high interpretability.
A network summarization approach is proposed for weighted multiplex networks. This method focuses on removing structural redundancy while maintaining the information carried by the intrinsic structure of the graph. Using real-world data from different domains, the new method is shown to maintain more accurately the properties of the original graph and for a larger summarization percentage. The proposed method is shown to reduce the number of edges in the network faster than the baselines resulting in a more efficient summarization technique.
A novel time series classification model is developed which provides easy explanation and which extracts both informative shapelets and shapelet-orders and incorporates the shapelet-transformed space with shapelet-order space for time-series classification. The temporal dependencies among local discriminative patterns discovered by the proposed method were found to significantly increase the confidence of the prediction and further improves the classification performance. The results of extensive experiments conducted on 75 univariate and 6 multivariate real-world datasets provide evidence that the proposed model could significantly improve accuracy on average over considered state-of-the-art alternatives.
A book chapter is written on state-of-the-art natural language processing problems, algorithms, models, and libraries. This article provides practical examples of applications and is also used in the PI's data mining and information retrieval courses.
Last Modified: 04/27/2022
Modified by: Eduard Dragut
Please report errors in award information by writing to: awardsearch@nsf.gov.