Award Abstract # 1850605
RR: Establishing and Boosting Confidence Levels for Empirical Research Using Twitter Data

NSF Org: SMA
SBE Office of Multidisciplinary Activities
Recipient: AMERICAN UNIVERSITY
Initial Amendment Date: September 5, 2018
Latest Amendment Date: March 14, 2023
Award Number: 1850605
Award Instrument: Standard Grant
Program Manager: Mary Feeney
SMA
 SBE Office of Multidisciplinary Activities
SBE
 Directorate for Social, Behavioral and Economic Sciences
Start Date: July 1, 2018
End Date: August 31, 2023 (Estimated)
Total Intended Award Amount: $361,201.00
Total Awarded Amount to Date: $361,201.00
Funds Obligated to Date: FY 2018 = $361,201.00
History of Investigator:
  • Heng Xu (Principal Investigator)
    heng.xu@ufl.edu
  • Nan Zhang (Co-Principal Investigator)
Recipient Sponsored Research Office: American University
4400 MASSACHUSETTS AVE NW
WASHINGTON
DC  US  20016-8003
(202)885-3440
Sponsor Congressional District: 00
Primary Place of Performance: American University
DC  US  20016-8002
Primary Place of Performance
Congressional District:
00
Unique Entity Identifier (UEI): H4VNDUN2VWU5
Parent UEI:
NSF Program(s): SciSIP-Sci of Sci Innov Policy
Primary Program Source: 01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7626
Program Element Code(s): 762600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.075

ABSTRACT

Concerns about a reproducibility crisis in scientific research have become increasingly prevalent within the academic community and to the public at large. The field of meta-science, which performs the scientific study of science itself, is thriving and has examined the existence and prevalence of threats to reproducible and robust research. Most existing replication efforts in social sciences, however, have focused on studies using data from statistically rigorous designed surveys or experiments. Largely missing are replication efforts devoted to examining those studies with organic data, including data organically generated by ubiquitous sensors or mobile applications, twitter feeds, click streams, etc. This project examines the inconsistent handling practices of organic data among scholarly publications in social sciences, in order to establish the confidence (or the lack thereof) in the conclusions drawn from such data analysis. Since findings of social and behavioral sciences inform policy makers on a wide variety of issues, from homeland security to national economy, establishing the confidence of these findings is critical for the proper usage of them, and therefore has broader impacts on all these application areas of national priority.

More specifically, this project starts with determining the extent of, causes of, and remedies for empirical research using organic data that are neither reproducible nor generalizable. The findings from this step raise awareness about the standards and tools for collecting, cleaning, and processing organic data sets across many fields of social sciences. In addition, this project develops new analytical frameworks and methodologies useful for evaluating replicability and robustness of empirical studies with organic data. The vision is for such frameworks to be broadly used in many application domains, thereby fostering cultural change across different fields in social sciences, and bringing the value of reproducibility and robustness to the forefront of data intensive research.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Xu, Heng and Zhang, Nan and Zhou, Le "Validity Concerns in Research Using Organic Data" Journal of Management , v.46 , 2019 https://doi.org/10.1177/0149206319862027 Citation Details
Zhang, Nan and Wang, Mo and Xu, Heng "Disentangling effect size heterogeneity in meta-analysis: A latent mixture approach." Psychological Methods , v.27 , 2022 https://doi.org/10.1037/met0000368 Citation Details
Zhang, Nan and Xu, Heng "Reconciling the Paradoxical Findings of Choice Overload Through an Analytical Lens" MIS Quarterly , v.45 , 2021 https://doi.org/10.25300/MISQ/2021/16954 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The field of metascience -- the use of scientific methodology to study science itself -- has examined various aspects of this robustness requirement for research that uses conventional designed studies (e.g., surveys, laboratory experiments) to collect "designed data". Largely missing, however, are efforts to examine the robustness of empirical research using "organic data," namely, data that are generated without any explicit research design elements and are continuously documented by digital devices (e.g., video captured by ubiquitous sensing devices; content and social interactions extracted from social networking sites, Twitter feeds, and click streams). With the increasing use of organic data in social science research, it's crucial to grasp the challenges related to handling and processing organic data, as these might impact the solidity of research outcomes.

This project first presents a comprehensive overview of common problems that could undermine the validity of conclusions derived from empirical studies using designed data versus organic data. Subsequently, it embarks on an in-depth analysis focusing on two major areas: (1) the distinguishing characteristics of organic data compared to traditionally designed data, and (2) the typical workflows observed in studies utilizing organic data versus those using designed data. Within this analysis, potential validity threats specific to studies involving organic data versus designed data are identified, and corresponding solutions to these issues are also proposed.

In terms of organic data, the project identifies two broad categories of validity threats in research using organic data: one arising from the lack of transparency in how organic data is generated, and the other from the necessity of using automated algorithms for data extraction (including errors in algorithmic outputs and the complexity involved in managing the vast parameter/procedure space of an information-extraction algorithm). For each of these identified threats, the project also proposes potential remedies.

In terms of designed data, this project develops a novel analytical meta-analysis approach. This advanced method focuses on disentangling the observed heterogeneity across different studies. By utilizing the data reported in primary studies, this novel approach reveals the nature and potential causes of inconsistent findings in the literature. This approach provides a more thorough and detailed examination of the data, contributing to a clearer understanding of the complexities and nuances in research findings.

Overall, this project starts with determining the extent of, causes of, and remedies for empirical research using designed versus organic data that are neither reproducible nor generalizable. The findings from this step raise awareness about the standards and tools for collecting and processing designed and organic data sets across many fields of social sciences. In addition, this project develops new analytical frameworks and methodologies useful for evaluating replicability and robustness of social science studies. 

 

 

 

 

 

 


Last Modified: 12/21/2023
Modified by: Heng Xu

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page