
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | June 25, 2008 |
Latest Amendment Date: | July 23, 2012 |
Award Number: | 0746696 |
Award Instrument: | Continuing Grant |
Program Manager: |
Maria Zemankova
IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | July 1, 2008 |
End Date: | June 30, 2014 (Estimated) |
Total Intended Award Amount: | $553,476.00 |
Total Awarded Amount to Date: | $577,476.00 |
Funds Obligated to Date: |
FY 2009 = $133,054.00 FY 2010 = $124,493.00 FY 2011 = $134,049.00 FY 2012 = $104,120.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
4200 FIFTH AVENUE PITTSBURGH PA US 15260-0001 (412)624-7400 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
4200 FIFTH AVENUE PITTSBURGH PA US 15260-0001 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Info Integration & Informatics |
Primary Program Source: |
01000910DB NSF RESEARCH & RELATED ACTIVIT 01001011DB NSF RESEARCH & RELATED ACTIVIT 01001112DB NSF RESEARCH & RELATED ACTIVIT 01001213DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
The Web has permeated every facet of human activity. Web 2.0 is bringing a sea-change, both by the amount of user-generated content and by the level of automation for information exchange. The goal of this project is to promote quality on the Web into a first-class citizen, by (1) exposing quality information from Web data sources; (2) empowering users to specify their preferences for the different dimensions of quality (Quality of Service, Quality of Data, Quality of Information) through an intuitive, integrated framework, called Quality Agreements (QAs); and (3) influencing resource allocation decisions according to user preferences.
Towards this, the project reexamines query processing techniques in order to consider QAs (namely, query and update scheduling, caching and replication, and admission control) and addresses new challenges, stemming from the users' need to adapt QAs over time and their ability to collaborate. Project plans include the validation of the QA framework with a user-study, the evaluation of the proposed algorithms analytically and experimentally, and prototype development. The experimental aspects of this research are directly linked to the educational goals of this project and will generate many opportunities for graduate, undergraduate, and high-school students to participate in the research and development of new technologies. This project will empower users to tailor quality on the Web according to their preferences, which in turn can have great implications on the usability of Web 2.0 applications and on users' experience and satisfaction. Results of this research, including software, data, and publications, will be made publicly available via the project web site (http://db.cs.pitt.edu/user-centric).
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Big data is transforming all aspects of the human experience, be it everyday life, scientific exploration and discovery, medicine, business, law, journalism, and decision-making at all levels of government.
On the one hand, big data is primarily driven by computing technology becoming better and cheaper. As expected, these advances in computing technology lead to exponential increases in the size of data generated and processed. However, they also translate to exponential increases in data collected by scientific instruments, be it centralized massive instruments (such as the Large Hadron Collider and the Large Synoptic Survey Telescope), or by great numbers of small-but-now-more-affordable instruments (such as those used for next-generation sequencing), or by even greater numbers of personal mobile devices and tiny sensors.
On the other hand, the thirst for data is becoming the norm both from a consumer point of view (e.g., businesses want to collect as much data as possible for their customers) and also from a producer point of view (e.g., people increasingly feel the urge to share more and more details of their lives on social networks), leading to an exponential increase in user-contributed content.
Despite the increases in computing technology and availability/demand for data in the last few decades, the performance of one critical component in the data processing pipeline has remained roughly the same. Namely, the ability of humans to process data has not changed significantly in the last few decades!
We refer to this disparity as: the big data - same humans problem. This means that taking into account the user point-of-view is extremely crucial. Imagine a very efficient data management system that can process 1,000,000 data inputs per second and generate a mere one alert per second (i.e., a reduction of 1 in 1 million). Such a system will immediately overwhelm any potential user; prioritization/ranking of results, classification of important/not-important results, and other similar techniques are absolutely essential to make such a system usable/useful.
This project aimed to solve this problem and make data management more user-centric, by considering user preferences in different aspects of data management.
In particular, the project investigated both ``flavors'' of data: (i) data at rest, i.e., traditional database systems where data is stored in a database and query results are computed based on the current contents of the database and (ii) data in motion, i.e., data stream management systems, that process a never-ending input stream of data, searching for patterns/conditions of interest to the users.
Towards this, the project examined three different dimensions. First, it considered user preferences for different aspects of quality. For example, when a user prefers fast answers, but is willing to tolerate a small degree of staleness in the results, or when a user prefers to always get ``fresh'' results, but is willing to tolerate some small delay. Such preferences can be used by the system to determine resource allocation. Our proposed framework (named Quality Contracts) enables the system to satisfy the (potentially conflicting) preferences of multiple users instead of having a single global quality metric. Multiple research questions where addressed within this space, including how to determine the execution order of different operations in the presence of priorities, how to decide which data to drop if the system is overwhelmed, etc.
Secondly, the project examined user preferences for query result personalization (e.g., for ranking). In particular, a novel framework was proposed that combines quantitative preferences (e.g., I give this movie four stars out of five) and...
Please report errors in award information by writing to: awardsearch@nsf.gov.