Award Abstract # 1218043
III: Small: Providing Relevant and Timely Results: Real-Time Search Architectures and Relevance Algorithms

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF MARYLAND, COLLEGE PARK
Initial Amendment Date: September 11, 2012
Latest Amendment Date: September 11, 2012
Award Number: 1218043
Award Instrument: Standard Grant
Program Manager: Maria Zemankova
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2012
End Date: September 30, 2016 (Estimated)
Total Intended Award Amount: $499,960.00
Total Awarded Amount to Date: $499,960.00
Funds Obligated to Date: FY 2012 = $499,960.00
History of Investigator:
  • Jimmy Lin (Principal Investigator)
    jimmylin@umd.edu
Recipient Sponsored Research Office: University of Maryland, College Park
3112 LEE BUILDING
COLLEGE PARK
MD  US  20742-5100
(301)405-6269
Sponsor Congressional District: 04
Primary Place of Performance: University of Maryland College Park
MD  US  20742-5141
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): NPU8ULVAAS23
Parent UEI: NPU8ULVAAS23
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01001213DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Information search remains one of the best solutions today for satisfying individuals' problem-solving needs. However, we are inundated with growing quantities of information as its volume on different media grows, and so does its velocity -- the rate at which information is being generated, transmitted, and consumed. The growing importance of social media such as Twitter and blogs further exacerbates this problem. It is clear that better real-time search capabilities are needed. This project aims to advance the state of the art in information retrieval research by tackling the real-time search problem. The effort consists of two themes: the first concerns high-performance search architectures for low-latency, high-throughput query evaluation and indexing; the second concerns relevance algorithms, exploring strategies to model time-varying relevance signals in a learning-to-rank framework.

Enhanced real-time search capabilities promise to provide users more effective access to time-sensitive information. Scenarios include journalists tracking situations around the globe, victims of natural disaster trying to find loved ones, and political analysts digesting reactions to a candidate's speech. This project is expected to yield an open-source demonstration platform for real-time search on tweets and blogs. Close coordination with shared, community-wide evaluations at the NIST-sponsored Text Retrieval Conferences (TREC) further benefits the broader research community. More information is will disseminated via the project web site (http://www.umiacs.umd.edu/~jimmylin/projects/ ). Research results will be incorporated into class material for the large-data computing course that brings cloud computing into the classroom, and graduate students will have an opportunity to gain research and system development experience.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Nima Asadi and Jimmy Lin "Fast Candidate Generation for Real-Time Tweet Search with Bloom Filter Chains" ACM Transactions on Information Systems , v.31 , 2013

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Search remains one of the best solutions today for satisfying users' information needs. However, today we are inundated with increasing quantities of information, with no relief in sight. As data volume on different media grows, so does the velocity---the rate at which information is being generated, transmitted, and consumed. The growing importance of social media such as Twitter further exacerbates this problem. It is clear that better algorithms and systems for managing real-time document streams are needed: both retrospective techniques to handle content that has already accumulated as well as prospective techniques that anticipates future content in a proactive manner.


This project advanced the state of the art in information retrieval by tackling real-time search, and more broadly, addressing retrieval challenges associated with streams of documents and other types of dynamic document collections. From the perspective of intellectual merit, this project has made three main contributions: First, the development of high-performance search architectures for low-latency, high-throughput indexing and query evaluation, along with associated storage infrastructure for timestamped document collections. Second, methods for extracting temporal signals from streams of documents and temporally-focused ranking algorithms. Third, the development of a task model, algorithms, as well as an evaluation framework for push notifications, where systems proactively monitor document streams (e.g., social media posts) to identify and deliver those that are of interest to the user.


One important aspect of this project was close coordination with evaluation efforts at the Text Retrieval Conferences (TRECs) sponsored by the U.S. National Institute of Standards and Technology (NIST). Each year, TREC attracts dozens of participants from around the world to work on shared tasks that jointly define the future direction of information retrieval research. This project has developed task models and evaluation methodologies for the TREC Microblog and Real-Time Summarization Tracks, including an innovative "Living Labs" evaluation framework for prospective information needs that take advantage of live users to assess push notification systems. These efforts have had broader impact in helping to steer the overall research direction of the field. One additional significant broader impact of this project is the uptake of research results by industry. As a specific example, Twitter's real-time recommendation system GraphJet, which was deployed in 2014, makes use of results from this project involving memory allocation models for index structures.


Overall, this successful project has contributed much to real-time information access, from both the perspective of effectiveness (systems that delivery high-quality results) and efficiency (systems that deliver results with low latency).

 

 


Last Modified: 02/03/2017
Modified by: Jimmy J Lin

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page