
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | August 9, 2018 |
Latest Amendment Date: | May 19, 2023 |
Award Number: | 1835877 |
Award Instrument: | Standard Grant |
Program Manager: |
Alejandro Suarez
alsuarez@nsf.gov (703)292-7092 OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2018 |
End Date: | August 31, 2024 (Estimated) |
Total Intended Award Amount: | $584,151.00 |
Total Awarded Amount to Date: | $638,151.00 |
Funds Obligated to Date: |
FY 2022 = $36,000.00 FY 2023 = $18,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
6425 BOAZ ST RM 130 DALLAS TX US 75205-1902 (214)768-4708 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
6425 Boaz Lane Dallas TX US 75275-0302 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Data Cyberinfrastructure |
Primary Program Source: |
01002324DB NSF RESEARCH & RELATED ACTIVIT 01001819DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Preserving, sharing, navigating, and reusing large and diverse collections of data is now essential to scientific discoveries in areas such as phenomics, materials science, geoscience, and urban science. These data navigation needs are also important when addressing the growing number of research areas where data and tools must span multiple domains. To support these needs effectively, new methods are required that simplify and reduce the amount of effort needed by researchers to find and utilize data, support community accepted data practices, and bring together the breadth of standards, tools, and resources utilized by a community. Clowder, an active curation based data management system, addresses these needs and challenges by distributing much of the data curation overhead throughout the lifecycle of the data, augmenting this with social curation and automated analysis tools, and providing extensible community-dependent means of viewing and navigating data. As an open source framework, built to be extensible at every level, Clowder is capable of interacting with and utilizing a variety of community tools while also supporting different data governance and ownership requirements.
The project enhances Clowder's core systems for the benefit of a larger group of users. It increases the level of interoperability with community resources, hardens the core software, and distributes core software development, while continuing to expand usage. Governance mechanisms and a business model are established to make Clowder sustainable, creating an appropriate governance structure to ensure that the software continues to be available, supportable, and usable. The effort engages a number of stakeholders, taking data from diverse but converging scientific domains already using the Clowder framework, to address broad interoperability and cross domain data sharing. The overall effort will transition the grassroots Clowder user community and Clowder's other stakeholders (such as current and potential developers) into a larger organized community, with a sustainable software resource supporting convergent research data needs.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The Clowder CSSI project developed a new version of the Clowder research data management platform to support convergent research across multiple research disciplines while expanding the open source community and user base. Much of the data needed for today's science is highly diverse and increasingly large in size. Managing, sharing, curating, and analyzing that data requires software, particularly because reproducibility benefits from programmability. Clowder provides an end user framework that is customizable to any discipline and scalable to modern day big data requirements. Clowder continues to bring together data and metadata management, information extraction, data visualization, social curation, and data sharing under one open source framework. Its ability to let users define their metadata fields, bring their own algorithms and pipelines, develop web based visualization, while scaling to very large datasets, in one environment provides a unique offering in the realm of data management for research.
As part of this project, we developed a new version of the software stack based on a decade plus of use and development of Clowder to support communities such as biology, geoscience, materials science, crop science, civil engineering, social science, and the humanities with support from NSF, ONR, NARA, and other federal and state agencies. This version 2 (v2) was developed from scratch using modern technologies such as Python, Typescript, and React.js to provide a much improved user experience and make it easier for the community to contribute to the codebase.
Clowder v2 provides a brand new modern look based on Google’s Material Design system and a large number of improvements. For example, users can now version files and datasets and metadata can be associated with the specific versions. This lowers the clutter within the system and makes it easier for the researcher to update data and metadata. We have added brand new ways to share data with collaborators by introducing the concept of user groups and letting users enforce access on a dataset level. Machine metadata created by information extractors is now clearly separated from user defined metadata, with improvements on how the two are visualized and defined. Automated triggers for information extractors can now be defined using a generic query language as opposed to the original mime type implementation. This means that we can not only define what extractor will be automatically executed when a file of a particular type is uploaded, but also have more refined rules such as, a specific upload time, the pattern in the file name, or of a specific file size.
To broaden the community, we developed an online Webinar series, an in person community workshop and an online workshop, a hybrid hackathon, regular online dev meetings, and maintain an active Slack workspace. The webinar series included ten 1 hour long live presentations. The All Paws in person community workshops was a day long workshop co-located at PEARC 2019 with 45 participants. The online All Paws workshop in 2021 was 3 days long online and had 72 participants. The hybrid hackathon was attended by 22 people.
Throughout the life of the project the team has engaged with the material science community through the 4CeeD effort, the geoscience community through the Critical Interface Network (CINet) Critical Zone Observatory, the urban science community through the SMU partnership, the plant phenomics community through the TERRA-REF effort, the permafrost science community through the Permafrost Discovery Gateway project. These use cases have also generated positive outcomes. For example, the urban science use case discovered the presence of infrastructure deserts in Chicago and Dallas, which are low-income areas with significantly worse neighborhood infrastructure than other areas. Their findings were cited in City of Dallas’ Economic Development Policy and Economic Development Incentive Policy, the Dallas Housing Policy 2033, and 15 news stories. New areas of Clowder adoption and projects include NLP for literature mining of medical manuscripts, cultural heritage, analysing data from sensing devices for monitoring infants, managing microscope gigapixel images from archeological sites, 3D reconstruction of digital artifacts using photogrammetry, Arab American studies, cyberinfrastructure for deploying AI pipelines to the hybrid cloud, particle detection data management, HPC integration for particle imaging, and fossil pollen detection. Many of these efforts are still ongoing. Clowder version 2, a true open source data management platform that anyone is free to use and contribute to, will make it easier to adopt new use cases and extend the system based on new requirements over the foreseeable future.
Last Modified: 01/12/2025
Modified by: Barbara Minsker
Please report errors in award information by writing to: awardsearch@nsf.gov.