
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | July 31, 2018 |
Latest Amendment Date: | July 31, 2018 |
Award Number: | 1760052 |
Award Instrument: | Standard Grant |
Program Manager: |
Cheryl Eavey
OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2018 |
End Date: | August 31, 2021 (Estimated) |
Total Intended Award Amount: | $250,000.00 |
Total Awarded Amount to Date: | $250,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
1 NASSAU HALL PRINCETON NJ US 08544-2001 (609)258-3090 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
Wallace Hall Princeton NJ US 08544-1005 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | BD Spokes -Big Data Regional I |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
This research project will develop a collaborative data science platform for computational social science called the Data Science Foundry. The collection and management of large-scale data currently is a relatively unstructured process, with data-processing decisions being made in an ad hoc fashion. Society has started to rely on data-driven science to address policy-related questions, however. The development of a collaborative platform that provides structure will allow social scientists to collaborate and validate each other's studies. This project has the potential to transform how studies are designed and how data will be processed. The collaborative platform will result in a higher level of trust in the studies conducted via the collaborative curation of study design, procedures, and validation. The collaborative platform also will increase the number of studies that can be done in a short span of time. The platform will be developed as open-source, thereby facilitating interactions with the community and enabling different institutions to install the program.
This project will develop a collaborative platform that social scientists can use to collaborate and validate each other's studies. The investigative team will attempt to identify the best possible collaborative model for data-driven social science, determine how automation can most enhance the studies, and develop explicit and implicit mechanisms to establish trust in end-to-end data processing pipelines and the results they generate. To aid in the platform's development, the research team will focus on the prediction of outcomes from surveys, a specific yet widely applicable type of problem within computational social science. This class of problems involves much subjective assessment during the feature engineering state as well as copious interpretation during the data transformation stage. These unique challenges will benefit both from a collaborative workflow and from mechanisms that enable trust in the eventual results. The project will bring together three distinct teams to develop this platform: computer scientists to develop abstractions, APIs and systems; statisticians to help with methods and study design; and social scientists to help define the problems and workflow and to provide user feedback.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
In this research project, we developed approaches to collaborative computational social science, and we used those approaches to study the predictability of life outcomes. As part of this project, we conducted the Fragile Families Challenge, a scientific mass collaboration, involving more than 450 researchers from around the world (Salganik et al., 2020). These researchers attempted to predict six life outcomes, such as a child’s grade point average and whether a family would be evicted from their home. Researchers used machine learning methods optimized for prediction, and they drew on all the data collected during the Fragile Families and Child Wellbeing Study. However, no researchers were able to make very accurate predictions. For policymakers considering using predictive models in settings such as criminal justice and child-protective services, these results raise a number of concerns. Additionally, researchers must reconcile the idea that they understand life trajectories with the fact that none of the predictions were very accurate.
While conducting the mass collaboration, we developed approaches to address a number of methodological challenges that we encountered related to: privacy and ethics of data access (Lundberg et al., 2019), survey metadata (Kindel et al., 2019), and computational reproducibility (Liu and Salganik, 2019). We also contributed to reporting guidelines for future multi-analyst studies (Aczel et al., 2021). Collectively, these methodological contributions should make future mass collaborations more scientifically valuable and easier to conduct.
Finally, we shared our approach and results with the broader data science community in both written form (Salganik, Maffeo, and Rudin, 2020) and through presentations at universities, companies, and government agencies. We hope that our approach and results will lead to more scientific research and improved use of predictive models in high-stakes social settings.
Last Modified: 04/21/2022
Modified by: Matthew J Salganik
Please report errors in award information by writing to: awardsearch@nsf.gov.