
NSF Org: |
SES Division of Social and Economic Sciences |
Recipient: |
|
Initial Amendment Date: | August 31, 2018 |
Latest Amendment Date: | August 31, 2018 |
Award Number: | 1835075 |
Award Instrument: | Standard Grant |
Program Manager: |
Wenda K. Bauchspies
wbauchsp@nsf.gov (703)292-5034 SES Division of Social and Economic Sciences SBE Directorate for Social, Behavioral and Economic Sciences |
Start Date: | January 1, 2019 |
End Date: | December 31, 2022 (Estimated) |
Total Intended Award Amount: | $172,486.00 |
Total Awarded Amount to Date: | $172,486.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
341 PINE TREE RD ITHACA NY US 14850-2820 (607)255-5014 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
107 Hoy Road Ithaca NY US 14853-7501 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Cultivating Cultures of Ethica |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.075 |
ABSTRACT
The 'big data' era has created new scientific roles in the form of 'data scientists' - specialized information knowledge workers who apply technical and research skills to derive knowledge from large-scale databases. Concomitant with the creation of these new roles, however, has been the rise of high-profile controversies around the potential abusive use of big data and of the algorithms whose creation they enable. This project thus focuses on 'data science ethics' and works to understand how to best cultivate a culture of ethics in data science. It will assess the state, structure, and substance of data ethics in both educational and industrial contexts. Within the academic sector, the project will conduct interviews with faculty and undergraduates and analyze relevant syllabi to develop an account of what constitutes data ethics in the academy. Comparing these data across faculty and disciplines will show major areas of agreement and disagreement over what data ethics education should entail. Within the industrial sector, the project will research how data ethics are beginning to figure into corporate practice and study how companies define data ethics and attempt to integrate ethical considerations into everyday work. Data will come from in-depth interviews with those in industry and analyses of corporate documents from companies that have made public commitments to addressing data ethics. The PIs would then compare data and findings from across these two contexts to identify commonalities, differences, and useful strategies for advancing data ethics across social contexts and professional sectors.
This project will draw together a diverse group of researchers who in collaboration with educators and data science practitioners will: (1) document and assess barriers and opportunities for integrating ethics into data science practice; (2) assess the continuities and discontinuities emerging between industrial and academic contexts; and (3) develop a foundation for cohesive, comprehensive, integrative data ethics education. The investigators will use a range of complementary methods to do so, including qualitative interviews, expert judgement, and quantitative computational analyses of the latent thematic and topical structure of documents from academia and industry. Project outcomes and findings will be disseminated through conferences, journals, and through holding a synthetic workshop after the completion of data collection and analysis.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project set out to develop a better understanding of the current state of ethics education in computer, data, and information science and related fields across colleges and universities in the United States. To do so, we sought to document the degree to which ethics-related classes are offered in these departments and the particular topics covered and readings assigned.
Our data collection process occurred in four stages. We first developed a purposive sample of 100 colleges and universities in the United States, designed to include a mix of R1, R2, and R3 universities as well as baccalaureate, associate’s, religious, and historically black colleges. We then identified relevant ethics classes in computer, data, and information science departments and related departments between Fall 2018 and Spring 2020. Via course catalogs, course websites, online searches, and direct outreach, we then tried to obtain syllabi for each of these classes. Finally, from these syllabi, we extracted the topics that each class covered and the readings assigned for each of these topics.
To our knowledge, our study is the first to generate an empirical account of the data ethics classes offered in a systematically collected sample of colleges and universities in the United States (prior work has relied on convenience samples or case studies). The resulting dataset offers a number of interesting and important findings:
- Data ethics classes are being taught in all manner of colleges and universities in the United States, ranging from large research universities to small liberal arts and community colleges, including those that are religiously affiliated and historically Black.
- The most commonly covered topics in these classes include privacy and security, as one might expect, but there is a long tail that covers a much larger set of topics. This suggests that while there is some agreement in the field about what constitutes the core set of topics for data ethics classes, there is still a good deal of disagreement about the additional topics that should be included in such classes.
- Likewise, while some readings are relatively popular (e.g., Langdon Winner’s “Do Artifacts have Politics”), only ~10% of readings are assigned more than once across the entire set of classes. Even when teaching the same topics, instructors currently rely on diverse readings to introduce students to these topics. These results suggest that there is not a well-established canon of scholarship in data ethics that is commonly taught across the country.
- In keeping with the above findings, while some authors are relatively popular (e.g., Zeynep Tufekci), only 20% of authors in the corpus are assigned more than once across the entire set of classes. This highlights the diverse approaches that instructors currently take in curating the readings for their syllabi.
Taken together, these findings suggest that data ethics have become a common feature of computer, data, and information science education, but that the substance of these classes is quite varied. Our findings reflect a field in flux, as instructors experiment with new topics and readings, even as they continue to teach some of the more well-established topics in privacy, security, etc.
Outcomes
We are currently writing up the results of our analysis as part of a paper that we will submit for peer-reviewed publication. This will be the primary mechanism by which we report our findings.
We are also preparing the underlying data for public release, which we will make available online.
Finally, we intend to give presentations of our work at appropriate academic venues, including conferences and invited lectures.
Broader impact
While college and university instructors often consult each other’s syllabi, our findings will provide the first systematic overview of what instructors are doing when they teach data ethics in the United States. This should prove helpful to instructors as they consider what to teach and which materials to use when doing so.
Additionally, because we plan to make our data publicly available online, instructors will have access not just to our analysis, but to the underlying details of each class as well, including the topics and readings covered. This dataset includes nearly 3,000 entries and should serve as an especially useful resource.
We expect that other researchers will also subject our dataset to further study, perhaps uncovering additional insights that we had not considered ourselves.
Finally, our findings also provide insights into the kind of training that students are receiving in data ethics before entering the labor market. As such, it should be of interest to industry, civil society, and government, who are each increasingly concerned with the degree to which computing professionals possess the necessary skills to effectively navigate difficult ethical challenges. In particular, our findings might highlight possible misalignments between what is being taught in data ethics classes and the kinds of challenges that computing professionals are likely to face in practice.
Last Modified: 07/11/2023
Modified by: Solon Barocas
Please report errors in award information by writing to: awardsearch@nsf.gov.