Award Abstract # 1835877
Collaborative Research: CSSI: Framework: Data: Clowder Open Source Customizable Research Data Management, Plus-Plus

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: SOUTHERN METHODIST UNIVERSITY
Initial Amendment Date: August 9, 2018
Latest Amendment Date: May 19, 2023
Award Number: 1835877
Award Instrument: Standard Grant
Program Manager: Alejandro Suarez
alsuarez@nsf.gov
 (703)292-7092
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2018
End Date: August 31, 2024 (Estimated)
Total Intended Award Amount: $584,151.00
Total Awarded Amount to Date: $638,151.00
Funds Obligated to Date: FY 2018 = $584,151.00
FY 2022 = $36,000.00

FY 2023 = $18,000.00
History of Investigator:
  • Barbara Minsker (Principal Investigator)
    minsker@smu.edu
  • Jessie Zarazaga (Co-Principal Investigator)
  • Kenneth Berry (Former Co-Principal Investigator)
Recipient Sponsored Research Office: Southern Methodist University
6425 BOAZ ST RM 130
DALLAS
TX  US  75205-1902
(214)768-4708
Sponsor Congressional District: 24
Primary Place of Performance: Southern Methodist University
6425 Boaz Lane
Dallas
TX  US  75275-0302
Primary Place of Performance
Congressional District:
32
Unique Entity Identifier (UEI): D33QGS3Q3DJ3
Parent UEI: S88YPE3BLV66
NSF Program(s): Data Cyberinfrastructure
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
01002324DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 062Z, 077Z, 7218, 7925, 8048, 9251
Program Element Code(s): 772600
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Preserving, sharing, navigating, and reusing large and diverse collections of data is now essential to scientific discoveries in areas such as phenomics, materials science, geoscience, and urban science. These data navigation needs are also important when addressing the growing number of research areas where data and tools must span multiple domains. To support these needs effectively, new methods are required that simplify and reduce the amount of effort needed by researchers to find and utilize data, support community accepted data practices, and bring together the breadth of standards, tools, and resources utilized by a community. Clowder, an active curation based data management system, addresses these needs and challenges by distributing much of the data curation overhead throughout the lifecycle of the data, augmenting this with social curation and automated analysis tools, and providing extensible community-dependent means of viewing and navigating data. As an open source framework, built to be extensible at every level, Clowder is capable of interacting with and utilizing a variety of community tools while also supporting different data governance and ownership requirements.

The project enhances Clowder's core systems for the benefit of a larger group of users. It increases the level of interoperability with community resources, hardens the core software, and distributes core software development, while continuing to expand usage. Governance mechanisms and a business model are established to make Clowder sustainable, creating an appropriate governance structure to ensure that the software continues to be available, supportable, and usable. The effort engages a number of stakeholders, taking data from diverse but converging scientific domains already using the Clowder framework, to address broad interoperability and cross domain data sharing. The overall effort will transition the grassroots Clowder user community and Clowder's other stakeholders (such as current and potential developers) into a larger organized community, with a sustainable software resource supporting convergent research data needs.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Hua, Cindy and Zarazaga, Jessie "Developing a Community-Based, Environmental Justice-Oriented Curriculum for STEM Learning" ASEE annual conference exposition , 2024 https://doi.org/10.18260/1-2--47148 Citation Details
Li, Zheng and Wang, Xinlei and Zarazaga, Jessie and Smith-Colin, Janille and Minsker, Barbara "Do infrastructure deserts exist? Measuring and mapping infrastructure equity: A case study in Dallas, Texas, USA" Cities , v.130 , 2022 https://doi.org/10.1016/j.cities.2022.103927 Citation Details
Safaei-Moghadam, Arefeh and Hosseinzadeh, Azadeh and Minsker, Barbara "Predicting real-time roadway pluvial flood risk: A hybrid machine learning approach coupling a graph-based flood spreading model, historical vulnerabilities, and Waze data" Journal of Hydrology , v.637 , 2024 https://doi.org/10.1016/j.jhydrol.2024.131406 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The Clowder CSSI project developed a new version of the Clowder research data management platform to support convergent research across multiple research disciplines while expanding the open source community and user base. Much of the data needed for today's science is highly diverse and increasingly large in size. Managing, sharing, curating, and analyzing that data requires software, particularly because reproducibility benefits from programmability. Clowder provides an end user framework that is customizable to any discipline and scalable to modern day big data requirements. Clowder continues to bring together data and metadata management, information extraction, data visualization, social curation, and data sharing under one open source framework. Its ability to let users define their metadata fields, bring their own algorithms and pipelines, develop web based visualization, while scaling to very large datasets, in one environment provides a unique offering in the realm of data management for research.

 

As part of this project, we developed a new version of the software stack based on a decade plus of use and development of Clowder to support communities such as biology, geoscience, materials science, crop science, civil engineering, social science, and the humanities with support from NSF, ONR, NARA, and other federal and state agencies. This version 2 (v2) was developed from scratch using modern technologies such as Python, Typescript, and React.js to provide a much improved user experience and make it easier for the community to contribute to the codebase. 

 

Clowder v2 provides a brand new modern look based on Google’s Material Design system and a large number of improvements. For example, users can now version files and datasets and metadata can be associated with the specific versions. This lowers the clutter within the system and makes it easier for the researcher to update data and metadata. We have added brand new ways to share data with collaborators by introducing the concept of user groups and letting users enforce access on a dataset level. Machine metadata created by information extractors is now clearly separated from user defined metadata, with improvements on how the two are visualized and defined. Automated triggers for information extractors can now be defined using a generic query language as opposed to the original mime type implementation. This means that we can not only define what extractor will be automatically executed when a file of a particular type is uploaded, but also have more refined rules such as, a specific upload time, the pattern in the file name, or of a specific file size.

 

To broaden the community, we developed an online Webinar series, an in person community workshop and an online workshop, a hybrid hackathon, regular online dev meetings, and maintain an active Slack workspace. The webinar series included ten 1 hour long live presentations. The All Paws in person community workshops was a day long workshop co-located at PEARC 2019 with 45 participants. The online All Paws workshop in 2021 was 3 days long  online and had 72 participants. The hybrid hackathon was attended by 22 people. 

 

Throughout the life of the project the team has engaged with the material science community through the 4CeeD effort, the geoscience community through the Critical Interface Network (CINet) Critical Zone Observatory, the urban science community through the SMU partnership, the plant phenomics community through the TERRA-REF effort, the permafrost science community through the Permafrost Discovery Gateway project. These use cases have also generated positive outcomes. For example, the urban science use case discovered the presence of infrastructure deserts in Chicago and Dallas, which are low-income areas with significantly worse neighborhood infrastructure than other areas. Their findings were cited in City of Dallas’ Economic Development Policy and Economic Development Incentive Policy, the Dallas Housing Policy 2033, and 15 news stories. New areas of Clowder adoption and projects include NLP for literature mining of medical manuscripts, cultural heritage, analysing data from sensing devices for monitoring infants, managing microscope gigapixel images from archeological sites, 3D reconstruction of digital artifacts using photogrammetry, Arab American studies, cyberinfrastructure for deploying AI pipelines to the hybrid cloud, particle detection data management, HPC integration for particle imaging, and fossil pollen detection. Many of these efforts are still ongoing. Clowder version 2, a true open source data management platform that anyone is free to use and contribute to, will make it easier to adopt new use cases and extend the system based on new requirements over the foreseeable future.

 


Last Modified: 01/12/2025
Modified by: Barbara Minsker

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page