Award Abstract # 0964695
HCC: Medium: Collaborative Research: Guiding Folksonomy Development to Enable Novel Tagging Applications

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: REGENTS OF THE UNIVERSITY OF MINNESOTA
Initial Amendment Date: April 23, 2010
Latest Amendment Date: June 29, 2015
Award Number: 0964695
Award Instrument: Standard Grant
Program Manager: William Bainbridge
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: April 15, 2010
End Date: March 31, 2016 (Estimated)
Total Intended Award Amount: $949,788.00
Total Awarded Amount to Date: $997,788.00
Funds Obligated to Date: FY 2010 = $965,788.00
FY 2011 = $16,000.00

FY 2015 = $16,000.00
History of Investigator:
  • Loren Terveen (Principal Investigator)
    terveen@cs.umn.edu
  • John Riedl (Co-Principal Investigator)
Recipient Sponsored Research Office: University of Minnesota-Twin Cities
2221 UNIVERSITY AVE SE STE 100
MINNEAPOLIS
MN  US  55414-3074
(612)624-5599
Sponsor Congressional District: 05
Primary Place of Performance: University of Minnesota-Twin Cities
2221 UNIVERSITY AVE SE STE 100
MINNEAPOLIS
MN  US  55414-3074
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): KABJZBBJ4B54
Parent UEI:
NSF Program(s): HCC-Human-Centered Computing
Primary Program Source: 01001011DB NSF RESEARCH & RELATED ACTIVIT
01001112DB NSF RESEARCH & RELATED ACTIVIT

01001516DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7367, 7924, 9215, 9251, HPCC
Program Element Code(s): 736700
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

This is a study of tagging, the assignment of labels to information objects by users, and the "folksonomy" categorization systems that can result. By the 19th Century, increasing amounts of information were being published, and it was clear that efficient methods of organization were needed for the information to be accessible. In response, categorization schemes like the Library of Congress Classification and the Dewey Decimal System were invented. The overall information dissemination system contained clear roles and divisions of labor: editors decided what got published, information professionals categorized published works, and most people simply consumed the results. The Internet has toppled this traditional approach. There is no publication barrier, so orders of magnitude more information is available online and information professionals cannot keep up. However, new technologies have arisen that work in this context, notably tagging. Any user can associate tags with items such as documents, movies, or photos, and the tags serve as keys for retrieval. Since tags can be created by any user, the number of tags contributed scales with a community's size: thus, tagging works at Internet scale. Tagging lets users represent their own perspectives, which aids retrieval.

However, tagging is a young technology, with significant challenges and unmet potential. Individual tags are often of poor quality, and many tagging systems are globally incoherent. Empirical evaluations of tagging systems in use are few, and formal comparisons to traditional approaches have not been done. Tagging applications have been limited mainly to search. This project addresses these challenges. It will develop a firmer scientific understanding of the strengths and weaknesses of tagging as a categorization method. It will explore the potential of tagging to enable powerful applications beyond information retrieval. The project consists of three main research activities: (1) Creating a set of metrics to quantify the value of a categorization structure; using these metrics in formal and empirical comparisons of tagging systems to traditional categorizations; (2) Designing mixed-initiative interaction techniques for computational agents and people to detect, evaluate and resolve problems in tagging systems; (3) Developing novel tag-based applications for users to express their preferences and navigate complex information spaces.

This research will create both information-theoretic and usage-based metrics to measure the value of a categorization structure. Studies will be done to show relations between the two types of metric, letting designers predict, for example, how many tags per item are required for effective user search. Systematic cost-benefit comparisons of tagging systems to traditional expert categorizations will be done, thus providing empirical data to a debate that has been characterized by heated conjecture. The utility and generality of a set of mixed-initiative interaction techniques and novel applications will be established by (a) implementing them in multiple platforms, and (b) evaluating them in careful field experiments.

Improving the effectiveness of tagging will help millions of users find the information, products, and services they seek. More directly, the techniques of this project will be implemented in four working online communities, for movie viewers, cyclists, ethics researchers, and politically interested citizens. Collectively these sites have tens of thousands of users, all of whom will benefit directly. Many students will be trained, learning multiple research methods and gaining valuable experience with real online communities. Finally, the software will be developed under an open source license and datasets will be published, thus facilitating other researchers and web site developers in their work.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Franklin Maxwell Harper Joseph A. Konstan "The MovieLens Datasets: History and Context" ACM Transactions on Interactive Intelligent Systems (TiiS) - Regular Articles and Special issue on New Directions in Eye Gaze for Interactive Intelligent Systems (Part 1 of 2) , v.5 , 2016 10.1145/2827872
Torre, F., Sheppard, A.S., Priedhorsky, R., and Terveen, L. "bumpy, caution with merging: an exploration of tagging in a geowik" Proceedings of GROUP , 2010 http://dx.doi.org/10.1145/1880071.1880097
Vig, J., Sen, S. and Riedl, T. "Navigating the Tag Genome" Proceedings of IUI , 2011 http://dx.doi.org/10.1145/1943403.1943418
Vig, J., Soukup, M., Sen, S. and Riedl, T. "TagExpression: Tagging with Feeling" Proceedings of UIST , 2010 http://dx.doi.org/10.1145/1866029.1866079

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Labels or keywords are a traditional means to help people organize and find information. A common example is the Dewey Decimal classes used to organize books in libraries; in addition, many professional fields have their own sets of keywords.

Traditionally, such label sets were developed by experts in a field through a project of careful analysis, and the developers worked to make sure the label sets were as complete, consistent, non-redundant, etc. as possible.

However, over the past 15 or so years an alterative has emerged: *user-defined label sets*. In systems such as Flickr, users apply labels -- known as "tags" -- to items. The tags help organize large datasets by supporting navigation and search. Crucially, as crowdsourced data, tagging democratizes the labeling process and allows organizations of data -- "foksonomies" -- to emerge from practices of the members of a community.

However, with the power and openness of crowdsourcing also come problems. Tags may be redundant, semantics may be unclear, obvious hierachical relationships may not be made explicit, etc.

In this project we analyzed several folksonomies to identify problems and their sources. We also developed new techniques to help people create better folksonomies and new algorithms that could make better use of folksonomies, e.g., in creating better item explanations in recommender systems. 


Last Modified: 07/05/2016
Modified by: Loren Terveen

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page