NSF Award Search: Award # 1528409

Award Abstract # 1528409

Collaborative Research: Updating the Militarized Dispute Data Through Crowdsourcing: MID5

NSF Org:	SES Division of Social and Economic Sciences
Recipient:	THE PENNSYLVANIA STATE UNIVERSITY
Initial Amendment Date:	September 17, 2015
Latest Amendment Date:	August 24, 2017
Award Number:	1528409
Award Instrument:	Continuing Grant
Program Manager:	Brian Humes SES Division of Social and Economic Sciences SBE Directorate for Social, Behavioral and Economic Sciences
Start Date:	September 15, 2015
End Date:	August 31, 2019 (Estimated)
Total Intended Award Amount:	$690,353.00
Total Awarded Amount to Date:	$690,353.00
Funds Obligated to Date:	FY 2015 = $225,755.00 FY 2016 = $229,563.00 FY 2017 = $235,035.00
History of Investigator:	Glenn Palmer (Principal Investigator) David Reitter (Co-Principal Investigator)
Recipient Sponsored Research Office:	Pennsylvania State Univ University Park 201 OLD MAIN UNIVERSITY PARK PA US 16802-1503 (814)865-1372
Sponsor Congressional District:	15
Primary Place of Performance:	Pennsylvania State Univ University Park 110 Technology Center Building University Park PA US 16802-7000
Primary Place of Performance Congressional District:
Unique Entity Identifier (UEI):	NPM2J7MSCF61
Parent UEI:
NSF Program(s):	Political Science
Primary Program Source:	01001516DB NSF RESEARCH & RELATED ACTIVIT 01001617DB NSF RESEARCH & RELATED ACTIVIT 01001718DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	9179
Program Element Code(s):	137100
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.075

ABSTRACT

General Summary

The Correlates of War Project's Militarized Interstate Dispute (MID) Data is the most prominent and heavily used data collection in the study of international conflict. The most recent version (MID4) was released in 2014 and brings the period covered to 1816-2010. The MID4 project utilized automated text classification procedures to make the process of identifying relevant news stories more efficient. Over the course of that project, the PIs determined the primary bottleneck in the workflow was the coding of those news documents. To address this inefficiency, The PIs completed a pilot project to determine whether crowdsourcing techniques could be used to code these documents. In the pilot, non-expert workers were paid small sums to read documents and to answer sets of questions, the answers to which were used to identify features of possible militarized incidents (the events that comprise MIDs). A systematic comparison of the crowdsourced responses with those of MID4 Project's trained coders revealed that the crowdsourced codings were completely accurate for 68 percent of the news reports coded; more importantly, high agreement among crowd responses on specific reports was strongly associated with correct coding. This enables the PIs to detect which documents require further expert involvement. As a result, the PIs can produce a majority of the MID data in near-realtime and at limited financial cost. These procedures are applied on the MID5 Project, which will update the MID data for the period 2011-2017.

Technical Summary

The MID5 project workflow begins with document retrieval from LexisNexis and document classification using the software and methods implemented in MID4. We discard the negatively classified documents, and proceed to extract metadata from the positively classified documents including the document title, the news agency that published the report, the date, and any actors mentioned in the text. Crowd workers are recruited through Amazon's Mechanical Turk and paid a wage to read one of these documents and answer a line of simple, objective questions about it. The questionnaire is predefined, but some extracted metadata is automatically inserted into the questionnaire to improve the quality of responses. Several workers complete a questionnaire for each document, leaving the PIs with problems of aggregation: how to combine multiple worker responses, possibly regarding multiple related questions, into usable data necessary to code the militarized incident. In the pilot study, the PIs show that Bayesian networks are the most effective way to achieve this aggregation. Recently, the PIs have made advances in semi-supervised text classification with hybrid, Deep Restricted Boltzmann Machines, which outperform previous methods in this task.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 11)

Show All

D?Orazio, Vito and Kenwick, Michael and Lane, Matthew and Palmer, Glenn and Reitter, David and Ebrahimi, Mansour "Crowdsourcing the Measurement of Interstate Conflict" PLOS ONE , v.11 , 2016 10.1371/journal.pone.0156527 Citation Details

D?Orazio, V., Kenwick, M., Kelly, M., Okyere, D., Palmer, G., Reitter, D., & Terechshenko, Z. "An Experimental Analysis of Crowd-Sourcing Designs for Social Science Data Collection" Annual Meeting of the Society for Political Methodology, Madison, WI, July, 2017 , 2017

D?Orazio, V., Kenwick, M., Lane, M., Palmer, G., & Reitter, D. "Crowdsourcing the measurement of interstate conflict." PLOS ONE , 2016 , p.e0156527

Kelly, M. A. and Reitter, D. "Holographic Declarative Memory: Using distributional semantics within ACT-R" 2017 AAAI Fall Symposium Series: Technical Reports , 2017 Citation Details

Kelly, M. A., & Reitter, D. "How language processing can shape a common model of cognition" Papers on the Common Model of Cognition. Procedia Computer Science , 2018

Kelly, Matthew A. and Reitter, David "Holographic Declarative Memory: Using distributional semantics within ACT-R" Proceedings of the Association for the Advancement of Artificial Intelligence Fall Symposium on A Standard Model of the Mind , 2017 Citation Details

Kelly, Matthew A. and Reitter, David and West, Robert L. "Degrees of Separation in Semantic and Syntactic Relationships" Proc 15th. International Conference on Cognitive Modeling , 2017 Citation Details

Kelly, Matthew A. and West, Robert L. "A Framework for Computational Models of Human Memory" AAAI Fall Symposium, A Standard Model of Mind: AAAI Technical Report , 2017 Citation Details

Matthew A. Kelly, David Reitter, and Robert L. West "Degrees of Separation in Semantic and Syntactic Relationships." Proceedings of the 15th International Conference on Cognitive Modeling , v.15 , 2017

McDowell, William and Chambers, Nathaniel and Ororbia II, Alexander G. and Reitter, David "Event Ordering with a Generalized Model for Sieve Prediction Ranking" Proceedings of the 8th International Joint Conference on Natural Language Processing , 2017 Citation Details

Ororbia II, Alexander G. and Giles, C. Lee and Reitter, David "Learning a deep hybrid model for semi-supervised text classification." Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing , 2015 Citation Details

(Showing: 1 - 10 of 11)

Show All

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The project – MID 5 – had two related goals. The ultimate purpose was to update the Militarized Interstate Dispute (MID) data up to as close to the present data as possible. This goal is a continuation of past NSF-support MID projects. The other purpose was to experiment with utilizing crowd-sourcing as a method of coding disputes. We approached this task after having experimented with a variety of ways to give coders news reports and with different methods of asking questions. Most of the time spent on the project was given to finding the best way to utilize the crowd. After a range of alternative methods were attempted – which are documented on our reports, publications and several conference presentations – we determined that the coding the MIDs was, essentially, too demanding for untrained individuals. We were unable to develop an aggregation technique across a small number of coders – generally 3-7 per news story – that “correctly” captured the events reported; the MID coding rules are demanding and, in some ways, peculiar enough such that intelligent but untrained readers cannot generally code the events consistent with the coding rules. To provide just one example, our meaning of “threat” is significantly different from common English usage. We concluded, unhappily, that the crowd could not be used to facilitate coding MIDs. In the last 1/3 of the project, we reverted to having trained graduate students code news reports, the process used in previous MID Projects. We are currently (January, 2020) coding MIDs through 2014, using funds provided by Penn State University. We hope to have this competed by the end of the Spring semester.

Last Modified: 01/29/2020
Modified by: Glenn H Palmer

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error