Award Abstract # 1018321
III: Small: Modeling and Inferring Searcher Intent by Mining User Interactions

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: EMORY UNIVERSITY
Initial Amendment Date: August 1, 2010
Latest Amendment Date: August 1, 2010
Award Number: 1018321
Award Instrument: Standard Grant
Program Manager: Maria Zemankova
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2010
End Date: August 31, 2014 (Estimated)
Total Intended Award Amount: $500,000.00
Total Awarded Amount to Date: $500,000.00
Funds Obligated to Date: FY 2010 = $500,000.00
History of Investigator:
  • Yevgeny Agichtein (Principal Investigator)
    eugene.agichtein@emory.edu
Recipient Sponsored Research Office: Emory University
201 DOWMAN DR NE
ATLANTA
GA  US  30322-1061
(404)727-2503
Sponsor Congressional District: 05
Primary Place of Performance: Emory University
201 DOWMAN DR NE
ATLANTA
GA  US  30322-1061
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): S352L5PJLMP8
Parent UEI:
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01001011DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7923
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Inferring searcher intent is a central problem in information retrieval and web search: for effective ranking and result presentation, the search engine must know what the user is looking for. Yet, expressing a searcher information need currently relies on entering the ?right? search keywords, which can require multiple rounds of trial-and-error from the searcher. The goal of this project is to develop effective methods for a search engine to automatically infer searcher intent and information needs from the searcher interactions and behavior data. Specifically, the project addresses two main challenges of search intent inference: developing accurate and robust models of searcher intent and behavior, and exploiting these models to infer search intent for each individual user. This project significantly advances previous efforts on implicit feedback and search modeling, by considering a wide range of user interaction and contextual features, and by developing novel techniques for mining and exploiting these signals to improve web search and information access.

To develop robust search intent and behavior models, the project uses machine learning and data mining techniques to model the connection between search actions and result page behavior and the searcher intent. The first stage of the project develops and evaluates these models in controlled lab environments, by combining eye tracking and search interface instrumentation data. The second stage of the project empirically validates the intent inference models through a large-scale collection of search behavior data using a variety of remote user studies with instrumented search interfaces. Finally, the project applies the resulting models and algorithms to improve performance on key information retrieval tasks including result ranking, automatic query expansion, and search result presentation.

The techniques developed in this project are expected to make web search and information access more intuitive and effective for millions of users through collaboration with major search engine companies. Additional broader impacts will be achieved through domain-specific applications of the developed techniques, ranging from improved library search to web-based diagnostics of cognitive impairment. All aspects of the project will involve graduate and undergraduate students, and the resulting tools and datasets are to be integrated into undergraduate course instruction and projects, thus broadening participation in computer science research. The resulting publications, software, and datasets will be made publicly available on the project website (http://ir.mathcs.emory.edu/intent/).

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

For a Web search engine to find and present the results to the user, it must know what the user is looking for. Unfortunately, people often have trouble formulating effective queries for search engines, or even knowing exactly what information they wish to find. Because of this, searching often can require many rounds of trial-and-error, which may involve rephrasing the query, scrolling through irrelevant results, and viewing multiple pages, until finding the needed information. This project aims to decipher clues from searcher behavior, such as clicks, scrolling through the results, and even computer mouse cursor movements on the screen, to guess what the user is looking for, and to make the search better.  These clues from searcher behavior complement the traditional keyword search input from the user, and could dramatically improve search engine functionality: ranking of results, suggesting queries, or personalizing search for each user.

To accomplish this goal, the project developed new machine learning techniques to interpret the user’s goals from the observed behavior. As a first step, the project developed a general model of what constitutes a difficult or successful search, and developed systems to perform controlled, large-scale user studies to understand the most common cases of search problems, while tracking every action of the searcher. The study confirmed that a key cause of search difficulty is formulating effective queries, but also finding the needed information within the documents. Subsequent studies by this project, and by other researchers used the developed experimental infrastructure to study how to extract signals from the search behavior data, and how to use these signals for better ranking of the search results. The identified behavior signals were also used to improve the quality of search result summary generation algorithms.

The project further expanded the search behavior models mobile devices with touch screen interfaces and different form factors, and developed algorithms to automatically discover patterns in search behavior that indicated whether a page was relevant, as well as which part of the page was most useful for the search.  The resulting techniques were also applied to automatically measure the attention and interest of the searchers, which has the potential to enable Web-based screening of memory loss associated with Alzheimer’s disease. The systems and datasets produced by this project were shared freely online, and have been used by multiple research groups. Twenty five peer-reviewed publications, and two U.S. patents resulted from this project, and the research results and findings have been adapted by industry partners and collaborators from the major Web search engine companies.

This project provided research focus and support to four PhD students, who have successfully graduated and have gone on to pursue careers in industry research and development. Six undergraduate students also participated as research assistants in this project, which helped them secure software engineering jobs upon graduation.


Last Modified: 10/11/2017
Modified by: Yevgeny Eugene Agichtein

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page