Award Abstract # 1910225
CHS: Small: A Genealogical Framework to Understand the Emergence of Online Groups

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: THE REGENTS OF THE UNIVERSITY OF COLORADO
Initial Amendment Date: July 26, 2019
Latest Amendment Date: July 26, 2019
Award Number: 1910225
Award Instrument: Standard Grant
Program Manager: Wendy Nilsen
wnilsen@nsf.gov
 (703)292-2568
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2019
End Date: December 31, 2023 (Estimated)
Total Intended Award Amount: $499,724.00
Total Awarded Amount to Date: $499,724.00
Funds Obligated to Date: FY 2019 = $499,724.00
History of Investigator:
  • Brian Keegan (Principal Investigator)
    brian.keegan@colorado.edu
  • Chenhao Tan (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Colorado at Boulder
3100 MARINE ST
Boulder
CO  US  80309-0001
(303)492-6221
Sponsor Congressional District: 02
Primary Place of Performance: University of Colorado at Boulder
3100 Marine Street, Room 481
Boulder
CO  US  80303-1058
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): SPVKK1RC2MZ3
Parent UEI:
NSF Program(s): HCC-Human-Centered Computing
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7367, 7923
Program Element Code(s): 736700
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The project will study the emergence of online groups by examining their overlooked but important connections to existing groups. New online groups do not appear spontaneously, but are created by users who participated in existing online groups. A new group's early members carry their own activity history, which reveals their past group memberships as well as the relations between existing groups and this new group. The ability to trace the behavior of users before they start new groups presents a unique opportunity to understand how new groups emerge, how social norms arise in new groups, and what factors contribute to the success of new groups. This research will quantitatively model the genealogical relationships between online groups by tracing the sociotechnical lineage of "child" groups through their early members' previous participation in "parent" groups. Platform administrators or group moderators can use genealogical approaches for adopting existing norms in other groups, recommending new groups to users, and identifying opportunities to create new groups. A genealogical perspective can also explain the variability in group success as well as how norms spread throughout online groups. This work will enable transformative changes in online communities, either via redesign of their platforms or through implementation of new community policies.

The project draws on theories about formation of online communities, organizational ecology, and kinship to explore how a group's position within a genealogical graph influences the group's identity, norms, and success. This research will analyze log data about user behavior over time from Reddit and Wikipedia. Aggregating the socio-technical lineages of multiple groups together generates a genealogical graph documenting how new groups emerge from the old. These genealogies will be evaluated across platforms, users, and time. The project will advance human-centered data science methods by employing qualitative methods, such as interviews and focus groups, to validate the proposed genealogical graphs, which will inform quantitative methods for building genealogy graphs from large-scale log data sets and analyzing their structure and dynamics.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Smith, C. Estelle and Alam, Irfanul and Tan, Chenhao and Keegan, Brian C. and Blanchard, Anita L. "The Impact of Governance Bots on Sense of Virtual Community: Development and Validation of the GOV-BOTs Scale" Proceedings of the ACM on Human-Computer Interaction , v.6 , 2022 https://doi.org/10.1145/3555563 Citation Details
Zhang, J.S. and Keegan, B. and Lv, Q. and Tan, C. "Understanding the Diverging User Trajectories in Highly-related Online Communities during the COVID-19 Pandemic" Proceedings of the International AAAI Conference on Weblogs and Social Media , v.15 , 2021 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The goal of this project was to use the metaphor of "genealogy" as an alternative relationship between online communities. Funding for this project produced four publications and supported a research team of undergraduate, graduate, and post-doctoral research assistants voer the four-year period of the award. The project advanced human-centered data science methods by combining research methods like qualitative content analysis, quantitative surveys, and computational information retrieval and validated new survey instruments, synthesized new community archetypes, and engaged with marginalized groups to understand their challenges governing online communities.

We introduced a general framework identifying and comparing online community archetypes found within Reddit based on patterns of content and behavior within a community. These archetypes include question-answering, learning, social support, content generation, and affiliation. We identified methodological implications for researching each of these community archetypes that are often overlooked by treating all online communities similarly.

We developed and validated a new survey instrument for assessing the role of automated governance "bots" within online communities. These bots play important roles in creating more equitable and accessible communities while reducing the challenges for human moderators, but their impacts on end users was poorly understood. Collaborating with an established organizational psychologist, we iteratively developed and validated a new survey instrument to measure the effects of bots on users' sense of virtual community across a diverse sub-reddits. This instrument pre-dates the deployment of bots leveraging large language models but is already having an impact by allowing researchers to measure the effects of these LLM-based bots on online community dynamics.

We extended the findings about diverse senses of virtual community across sub-reddits to focus on a large and influential online community for marginalized groups. Using a combination of surveys and interviews, we illustrated the challenges and opportunities of governing large online commnunities for minority groups through a combination of verification and unique moderation strategies. These findings have important implications for improving trust and safety mechanisms for marginalized people as new social media platforms like Mastodon, Bluesky, and Threads emerge.

This project also spanned two major disruptions. The first was the COVID-19 pandemic. We analyzed how similar online communities managed different motivations and identities in responding to the earliest phases of the pandemic. We also examined how Wikipedia engaged in high-tempo collaborations to produce and govern content about the virus, disease, and pandemic through the early months of the pandemic. Our project also had to manage severe disruptions resulting from Reddit's closure of the Pushshift API in spring 2023, which required us to improvise and rebuild alternative data infrastructures to continue our research. Our experience mangaging this disruption to our external data source led us to submit a follow-up NSF grant opportunity to responsibly build and govern research data infrastructure for social media data outside of social media companies.


Last Modified: 05/15/2024
Modified by: Brian C Keegan

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page