NSF Award Search: Award # 1912270

Award Abstract # 1912270

Collaborative Proposal: CRCNS US-German Data Sharing Proposal: DataLad - a decentralized system for integrated discovery, management, and publication of digital objects of science

NSF Org:	IIS Division of Information & Intelligent Systems
Recipient:	TRUSTEES OF INDIANA UNIVERSITY
Initial Amendment Date:	September 13, 2019
Latest Amendment Date:	September 13, 2019
Award Number:	1912270
Award Instrument:	Standard Grant
Program Manager:	Jonathan Fritz IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering
Start Date:	December 1, 2019
End Date:	December 31, 2021 (Estimated)
Total Intended Award Amount:	$152,802.00
Total Awarded Amount to Date:	$152,802.00
Funds Obligated to Date:	FY 2019 = $91.00
History of Investigator:	Franco Pestilli (Principal Investigator) pestilli@utexas.edu
Recipient Sponsored Research Office:	Indiana University 107 S INDIANA AVE BLOOMINGTON IN US 47405-7000 (317)278-3473
Sponsor Congressional District:	09
Primary Place of Performance:	Indiana University 509 E. 3rd St Bloomington IN US 47404-3654
Primary Place of Performance Congressional District:	09
Unique Entity Identifier (UEI):	YH86RTW2YVJ4
Parent UEI:
NSF Program(s):	Cognitive Neuroscience
Primary Program Source:	01001920DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7327, 8089, 8091
Program Element Code(s):	169900
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Scientists collect terabytes of critical data every year. Recently a strong open science movement has generated traction for the beneficial practice of sharing data across laboratories, universities and research institutions. Yet, sharing data is not enough. Data must be shared using standardized formats and accompanied by curated metadata to allow for tracking, search, and organization. Metadata are essential for scientific discovery, as they are routinely used to complete all data analyses. However, to date, most brain projects focus on collecting or analyzing data, not on metadata management. Typical metadata records consist of heterogeneous study descriptions, developed at study release stage, without consistency across records or standard mechanisms to track changes.
This project will increase access to brain data and improve metadata handling by combining two NSF-funded projects. It will develop a first-of-its-kind metadata management system able to track data and metadata distributed across heterogeneous geographical locations, storage systems and data formats. This portion of the project will expand the functionality of a previously funded NSF project DataLad. DataLad will also be enhanced to interoperate with major data repositories such as OSF and Figshare. Furthermore, the project will use the NSF-funded cloud computing platform brainlife.io to create a data and metadata marketplace by gathering data from multiple currently separated repositories into a single ecosystem . The goal is to improve interoperability across open science projects and make data and metadata easily searchable and available for computing on national cyberinfrastructure systems, ultimately advancing scientific discovery by increasing data discoverability, utilization, and publication.

This project will generate various technological advances. The core target will be an extensible system capable of automated gathering of metadata from various domains. It will be comprised of two major components: 1) a set of metadata parser algorithms that extract metadata from datasets and individual files using a flexible JSON-LD based data structure (with the ability to encode controlled vocabularies where available) and 2) an aggregation procedure that merges the aggregated metadata across parsers and stores them into compressed files that are optimized for bandwidth-efficient exchange and can be queried directly, or used as input into SQL or graph databases for data discovery applications. Extracted metadata will be included within the same datasets under Git and git-annex version control for unambiguous referencing and versatile data logistics. In parallel development we will improve interoperability of DataLad with existing data publishing portals (such as Figshare and OSF) by taking advantage of extracted metadata (e.g., Author, Description) to prefill required fields, and also by bundling the entire Git object store within the publication to make such published datasets installable back by DataLad without any loss of information. To make such published datasets discoverable, we will establish a crowd-sourced registry (with a RESTful API) which will get announcements on the availability of new datasets upon publication and aggregate their metadata to enable querying across datasets and data hosting providers. The final development will be the integration of DataLad within the brainlife.io data marketplace. This will make it possible to search and install datasets on brainlife.io as well as to process the data utilizing the brainlife.io analyses Apps on various NSF-funded national cyberinfrastructure high-throughput computer systems.

A companion project is being funded by the Federal Ministry of Education and Research, Germany (BMBF).

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 16)

Show All

Ahmadi, Khazar and Fracasso, Alessio and Puzniak, Robert J. and Gouws, Andre D. and Yakupov, Renat and Speck, Oliver and Kaufmann, Joern and Pestilli, Franco and Dumoulin, Serge O. and Morland, Antony B. and Hoffmann, Michael B. "Triple visual hemifield maps in a case of optic chiasm hypoplasia" NeuroImage , v.215 , 2020 https://doi.org/10.1016/j.neuroimage.2020.116822 Citation Details

Bertò, Giulia and Bullock, Daniel and Astolfi, Pietro and Hayashi, Soichi and Zigiotto, Luca and Annicchiarico, Luciano and Corsini, Francesco and De Benedictis, Alessandro and Sarubbo, Silvio and Pestilli, Franco and Avesani, Paolo and Olivetti, Emanuele "Classifyber, a robust streamline-based linear classifier for white matter bundle segmentation" NeuroImage , v.224 , 2021 https://doi.org/10.1016/j.neuroimage.2020.117402 Citation Details

Caron, Bradley and Stuck, Ricardo and McPherson, Brent and Bullock, Daniel and Kitchell, Lindsey and Faskowitz, Joshua and Kellar, Derek and Cheng, Hu and Newman, Sharlene and Port, Nicholas and Pestilli, Franco "Collegiate athlete brain data for white matter mapping and network neuroscience" Scientific Data , v.8 , 2021 https://doi.org/10.1038/s41597-021-00823-z Citation Details

Chandio, Bramsh Qamar and Risacher, Shannon Leigh and Pestilli, Franco and Bullock, Daniel and Yeh, Fang-Cheng and Koudoro, Serge and Rokem, Ariel and Harezlak, Jaroslaw and Garyfallidis, Eleftherios "Bundle analytics, a computational framework for investigating the shapes and profiles of brain pathways across populations" Scientific Reports , v.10 , 2020 https://doi.org/10.1038/s41598-020-74054-4 Citation Details

Cheng, Hu and Vinci-Booher, Sophia and Wang, Jian and Caron, Bradley and Wen, Qiuting and Newman, Sharlene and Pestilli, Franco "Denoising diffusion weighted imaging data using convolutional neural networks" PLOS ONE , v.17 , 2022 https://doi.org/10.1371/journal.pone.0274396 Citation Details

Echevarria-Cooper, Shiloh L. and Zhou, Guangyu and Zelano, Christina and Pestilli, Franco and Parrish, Todd B. and Kahnt, Thorsten "Mapping the Microstructure and Striae of the Human Olfactory Tract with Diffusion MRI" The Journal of Neuroscience , v.42 , 2022 https://doi.org/10.1523/JNEUROSCI.1552-21.2021 Citation Details

Eke, Damian O. and Bernard, Amy and Bjaalie, Jan G. and Chavarriaga, Ricardo and Hanakawa, Takashi and Hannan, Anthony J. and Hill, Sean L. and Martone, Maryann E. and McMahon, Agnes and Ruebel, Oliver and Crook, Sharon and Thiels, Edda and Pestilli, Fran "International data governance for neuroscience" Neuron , v.110 , 2022 https://doi.org/10.1016/j.neuron.2021.11.017 Citation Details

Hanekamp, Sandra and uri-Blake, Branislava and Caron, Bradley and McPherson, Brent and Timmer, Anneleen and Prins, Doety and Boucard, Christine C. and Yoshida, Masaki and Ida, Masahiro and Hunt, David and Jansonius, Nomdo M. and Pestilli, Franco and Co "White matter alterations in glaucoma and monocular blindness differ outside the visual system" Scientific Reports , v.11 , 2021 https://doi.org/10.1038/s41598-021-85602-x Citation Details

Hanke, Michael and Pestilli, Franco and Wagner, Adina S. and Markiewicz, Christopher J. and Poline, Jean-Baptiste and Halchenko, Yaroslav O. "In defense of decentralized research data management" Neuroforum , v.27 , 2021 https://doi.org/10.1515/nf-2020-0037 Citation Details

Kaneko, Takaaki and Takemura, Hiromasa and Pestilli, Franco and Silva, Afonso C. and Ye, Frank Q. and Leopold, David A. "Spatial organization of occipital white matter tracts in the common marmoset" Brain Structure and Function , 2020 10.1007/s00429-020-02060-3 Citation Details

McPherson, Brent C. and Pestilli, Franco "A single mode of population covariation associates brain networks structure and behavior and predicts individual subjects age" Communications Biology , v.4 , 2021 https://doi.org/10.1038/s42003-021-02451-0 Citation Details

(Showing: 1 - 10 of 16)

Show All

Please report errors in award information by writing to: awardsearch@nsf.gov.

Success

Error