
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | March 14, 2013 |
Latest Amendment Date: | October 24, 2014 |
Award Number: | 1255781 |
Award Instrument: | Standard Grant |
Program Manager: |
Robert Chadduck
rchadduc@nsf.gov (703)292-2247 OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | March 15, 2013 |
End Date: | May 31, 2015 (Estimated) |
Total Intended Award Amount: | $99,718.00 |
Total Awarded Amount to Date: | $99,718.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
2550 NORTHWESTERN AVE # 1100 WEST LAFAYETTE IN US 47906-1332 (765)494-1055 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
155 S. Grant Street West Lafayette IN US 47907-2114 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Data Cyberinfrastructure |
Primary Program Source: |
|
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
CIF21 DIBBs: Conceptualization of the Social and Innovation Opportunities of Data Analysis
This proposal presents an opportunity to work on the problem that scientists have access to continuously growing data repositories across economic and geographic boundaries. However, both individual innovation and the formation of rich collaborations still rely on traditional research and social mechanisms. While virtual organizations help with access to data environments among groups, members must still proactively seek to collaborate. The difficulty of sharing analysis tools, and the lack of understanding of how such tools are used, create friction that impedes extracting the greatest benefit from data and its usage. If the scientific community can formalize collection of User Data Interaction (UDI) data and develop actionable characteristic behavior patterns from it, the friction can be relieved and scientists can be connected in behaviorally meaningful ways that are not currently imagined. In this proposal is discussed the opportunity for working on the problem. Data is the lifeblood of science. Recent funding opportunities have fueled support for uploading, archiving, and managing data in more formal and standard ways. However, the actual use of data through data exploration tools is still a highly variable process. Interactive data exploration tools provide the opportunity to record researcher interactions during the exploration process. The pattern of interactions such users undertake while searching, exploring, and using data is a largely unexploited opportunity for new connections and new learning that could help researchers identify useful exploration modes or gaps, and even new collaborative partners that could increase interactions and innovation. Such data about how users explore data are here termed, ?User-Data Interaction (UDI) Data.? Creating cyberinfrastructure building blocks to support a standard for collecting UDI Data, community development of data exploration tools, and the exploration of UDI data could fundamentally change the practice of science and engineering. Having such data and analysis tools hosted within a shared cyberinfrastructure could also allow for unprecedented study of their use and effectiveness.
The goal of this conceptualization research will be to define an implementation project for the DIBBs program. To achieve this goal, the approach will be to understand the kinds of data analysis tools that various user communities currently use, those that they would like to create and share, and to explore the ensuing UDI data that could be collected and leveraged. A data source will be characterized as any service into which a user can specify a query and receive a semi-structured result. By way of example, this may include an online database with which users interact through forms, a graphical interface to a data cube, or even an online simulation tool. The proposing team has access to three such toolkits in use by thousands of users today (Rappture Toolkit, iKNEER, and DataView) to study as sources of analysis tools and UDI data. Specifically, access to the developers of these systems will provide information about how such systems could generate UDI data and what its important features may be. Having built an understanding from active communities and small group discussions, the final step of information gathering will be two larger discussions held in conjunction with two events: HUBbub 2013 and an NSF S2I2 conceptualization project meeting. The Intellectual Merit: This research will identify the social and technological roadblocks to sharing data analysis tools, and the transformational potential of UDI data. The intellectual merit of this activity will be an evidence-based blueprint for a cyberinfrastructure environment that will automatically gather UDI data, develop patterns from those data, and facilitate amplified discovery and collaboration based on those patterns in a way that acceptably balances efficacy and privacy. Collaborations will increase and will be of greater substance. Broader Impacts: This work will pave the way for new scientific connections among researchers, educators, and students that will accelerate research and innovation. The difficulties that underrepresented groups inherently face in traditional methods of establishing scientific collaborations will be bridged by an implementation of the proposed work, allowing everyone to connect to tools and other researchers?not solely by established reputation, but based on their interactions with data. Because the work is not specific to one virtual organization or data tool, it will have a broad reach across diverse scientific communities that use data and data analysis tools.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project was undertaken to create a conceptual framework for recording how people use data analysis applications (called UDI data). The goal of this framework is to allow people who develop such applications to be able to equip them with a standard set of action recorders, to capture these actions in a central repository, and to allow exploration of this repository by the application users. With a central recording of how people use analysis applications, the users will be able to see where other users have already explored data, and as yet unexplored areas that represent new scientific opportunities. It also allows automated methods to match together scientists based upon the phenomena they are studying and by the methods they use, as opposed to traditional social networks that are unaware of specific scientific endeavors.
The intellectual merit of this work is in its creation of a new method for linking users of a variety of analysis tools that were not necessarily meant to communicate with each other. It represents an opportunity to integrate, through social science principles, scientists and engineers studying the physical sciences. The broader impact is that application users may find collaborators they never would have otherwise found.
The work specifically involved a series of interviews and surveys of analysis tool users, the study of existing data streams of UDI data emanating from several real world analysis tools, and the design of a more general framework for generating and capturing UDI data. The interviews with application creators showed a willingness and interest in creating UDI data. The surveys and interviews with application users highlighted a desire for finding new collaborators and a willingness to share their usage data for the good of accelerating open science. However, reservations were also expressed in terms of intellectual property protection. The study of existing UDI streams showed that distinguishable exploration patterns can be derived from them (and therefore they can be used to arrange matches between potential collaborators), and that the impact of application creators can be measured with a new structure that shows that the amount of contribution (in lines of source code) increases the impact of an application developer. However, application developers who only collaborate within small circles are shown to be negatively affecting their impact. Therefore, the notion of using data to help application creators increase their collaboration circles is also an important aspect of UDI data. Finally, a preliminary framework for capturing UDI data was designed considering that the applications that generate UDI data may reside on a server, or a family of servers, or even on a wide number of user desktops. Each such type of location has an impact on how UDI data may be captured centrally, and therefore impacts the design of the framework. The next step in this research is to create an implemented prototype of this framework to prove its feasibility.
Last Modified: 08/31/2015
Modified by: Michael Zentner
Please report errors in award information by writing to: awardsearch@nsf.gov.