
NSF Org: |
SMA SBE Office of Multidisciplinary Activities |
Recipient: |
|
Initial Amendment Date: | September 5, 2018 |
Latest Amendment Date: | March 14, 2023 |
Award Number: | 1850605 |
Award Instrument: | Standard Grant |
Program Manager: |
Mary Feeney
SMA SBE Office of Multidisciplinary Activities SBE Directorate for Social, Behavioral and Economic Sciences |
Start Date: | July 1, 2018 |
End Date: | August 31, 2023 (Estimated) |
Total Intended Award Amount: | $361,201.00 |
Total Awarded Amount to Date: | $361,201.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
4400 MASSACHUSETTS AVE NW WASHINGTON DC US 20016-8003 (202)885-3440 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
DC US 20016-8002 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | SciSIP-Sci of Sci Innov Policy |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.075 |
ABSTRACT
Concerns about a reproducibility crisis in scientific research have become increasingly prevalent within the academic community and to the public at large. The field of meta-science, which performs the scientific study of science itself, is thriving and has examined the existence and prevalence of threats to reproducible and robust research. Most existing replication efforts in social sciences, however, have focused on studies using data from statistically rigorous designed surveys or experiments. Largely missing are replication efforts devoted to examining those studies with organic data, including data organically generated by ubiquitous sensors or mobile applications, twitter feeds, click streams, etc. This project examines the inconsistent handling practices of organic data among scholarly publications in social sciences, in order to establish the confidence (or the lack thereof) in the conclusions drawn from such data analysis. Since findings of social and behavioral sciences inform policy makers on a wide variety of issues, from homeland security to national economy, establishing the confidence of these findings is critical for the proper usage of them, and therefore has broader impacts on all these application areas of national priority.
More specifically, this project starts with determining the extent of, causes of, and remedies for empirical research using organic data that are neither reproducible nor generalizable. The findings from this step raise awareness about the standards and tools for collecting, cleaning, and processing organic data sets across many fields of social sciences. In addition, this project develops new analytical frameworks and methodologies useful for evaluating replicability and robustness of empirical studies with organic data. The vision is for such frameworks to be broadly used in many application domains, thereby fostering cultural change across different fields in social sciences, and bringing the value of reproducibility and robustness to the forefront of data intensive research.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The field of metascience -- the use of scientific methodology to study science itself -- has examined various aspects of this robustness requirement for research that uses conventional designed studies (e.g., surveys, laboratory experiments) to collect "designed data". Largely missing, however, are efforts to examine the robustness of empirical research using "organic data," namely, data that are generated without any explicit research design elements and are continuously documented by digital devices (e.g., video captured by ubiquitous sensing devices; content and social interactions extracted from social networking sites, Twitter feeds, and click streams). With the increasing use of organic data in social science research, it's crucial to grasp the challenges related to handling and processing organic data, as these might impact the solidity of research outcomes.
This project first presents a comprehensive overview of common problems that could undermine the validity of conclusions derived from empirical studies using designed data versus organic data. Subsequently, it embarks on an in-depth analysis focusing on two major areas: (1) the distinguishing characteristics of organic data compared to traditionally designed data, and (2) the typical workflows observed in studies utilizing organic data versus those using designed data. Within this analysis, potential validity threats specific to studies involving organic data versus designed data are identified, and corresponding solutions to these issues are also proposed.
In terms of organic data, the project identifies two broad categories of validity threats in research using organic data: one arising from the lack of transparency in how organic data is generated, and the other from the necessity of using automated algorithms for data extraction (including errors in algorithmic outputs and the complexity involved in managing the vast parameter/procedure space of an information-extraction algorithm). For each of these identified threats, the project also proposes potential remedies.
In terms of designed data, this project develops a novel analytical meta-analysis approach. This advanced method focuses on disentangling the observed heterogeneity across different studies. By utilizing the data reported in primary studies, this novel approach reveals the nature and potential causes of inconsistent findings in the literature. This approach provides a more thorough and detailed examination of the data, contributing to a clearer understanding of the complexities and nuances in research findings.
Overall, this project starts with determining the extent of, causes of, and remedies for empirical research using designed versus organic data that are neither reproducible nor generalizable. The findings from this step raise awareness about the standards and tools for collecting and processing designed and organic data sets across many fields of social sciences. In addition, this project develops new analytical frameworks and methodologies useful for evaluating replicability and robustness of social science studies.
Last Modified: 12/21/2023
Modified by: Heng Xu
Please report errors in award information by writing to: awardsearch@nsf.gov.