
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | July 17, 2008 |
Latest Amendment Date: | September 13, 2013 |
Award Number: | 0830535 |
Award Instrument: | Standard Grant |
Program Manager: |
John Cozzens
CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | September 1, 2008 |
End Date: | December 31, 2015 (Estimated) |
Total Intended Award Amount: | $239,499.00 |
Total Awarded Amount to Date: | $1,860,696.00 |
Funds Obligated to Date: |
FY 2009 = $402,062.00 FY 2010 = $206,139.00 FY 2012 = $778,771.00 FY 2013 = $234,225.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
9500 GILMAN DR LA JOLLA CA US 92093-0021 (858)534-4896 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
9500 GILMAN DR LA JOLLA CA US 92093-0021 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | SIGNAL PROCESSING SYS PROGRAM |
Primary Program Source: |
01000910RB NSF RESEARCH & RELATED ACTIVIT 01001011RB NSF RESEARCH & RELATED ACTIVIT 01001213RB NSF RESEARCH & RELATED ACTIVIT 01001314RB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Classical image processing has mostly disregarded semantic image representations, in favor of more mathematically tractable representations based on low?]level signal properties (frequency decompositions, mean squared error, etc.). This is unlike biological solutions to image processing
problems, which rely extensively on understanding of scene content. For example, regions of faces are usually processed more carefully than the bushes in the background. The inability to tune image processing to the semantic relevance of image content frequently leads to the sub?]optimal allocation of
resources, such as bandwidth, error protection, or viewing time, to image areas that are perceptually irrelevant. One of the main obstacles to the deployment of semantic image processing systems has been the difficulty of training content?]understanding systems with large scale vocabularies. This is, in great part, due to the requirement for large amounts of training data and intensive human supervision associated with the classical methods for vocabulary learning. This research aims to establish a foundation for semantic image processing systems that can learn large scale vocabularies from
informally annotated data and no additional human supervision. It builds on recent advances in semantic image labeling, which have made it possible to learn vocabularies from noisy training data, such as that massively (and inexpensively) available on the web. The research studies both theoretical
issues in vocabulary learning, and the design of image processing algorithms that tune their behavior according to the content of the images being processed. Semantic image processing could lead to transformative advances in areas such as image compression, enhancement, encryption, de?]noising, or
segmentation, among others, which are of interest for applications as diverse as medical imaging, image search and retrieval, or security and surveillance.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project produced a number of contributions in the semantic representation of images. This is a novel image representation, which replaces the space of image pixels by a semantic space, where each dimension represents a visual concept. A vocabulary of visual concepts (objects, object attributes, scene classes, etc.) is first defined and a classifier is learned to detect the different concepts in the image. The image is finally represented by the vector of probabilities of containing the concepts in the vocabulary. This is known as the semantic multinomial (SMN) descriptor of the image. The process is illustrated in Figure 1.
The project showed that this representation is superior to the classical representation of the image by pixels, or features derived from those pixels, for a number of image processing operations.
Image retrieval: This addresses the search for images within an image database. The standard paradigm is query-by-visual-example (QBVE), where a vector of features is derived from each image and used to measure image similarity. Database images are then ranked by similarity to a query image. The project has introduced the query-by-semantic –example (QBSE) paradigm, where the feature vector is replaced by the SMN descriptor. This corresponds to measuring similarity between images in the semantic space. Since this space has a higher level of abstraction than that of classical image features (colors, textures, etc.) it enables retrieval systems with higher generalization ability and robustness. This is illustrated in Figure 2, where we compare the images retrieved by QBSE and QBVE, for a common image query. For QBSE, we also show the concepts of highest probability for each image. Note that the classic QBVE approach tends to return images with similar colors and textures and does not work well for complicated scenes. On the other hand, the QBSE approach now proposed returns images that contain similar concepts, e.g. peoples and buildings, to those in the query. This allows the retrieval operation to succeed even when these concepts appear in different sizes, positions, colors, etc. In result, the retrieval operation mimics the similarity judgments of people much more closely than under QBVE.
Cross-modal retrieval: One difficult problem in multimedia is to bring together information from multiple modalities. For example, how does one search an image database for the best match to a query text. The semantic representation provides a universal solution to this problem. Since, the representation is based on concept probabilities, not pixels, sounds, or characters, it provides an abstract space where information of multiple modalities can be equally projected. After all, concept classifiers can be built for images, sound, speech, music, or text with nearly equal difficulty. This is illustrated in Figure 3, for an example involving image and text data. Images and texts are initially represented in an image and text space, respectively. A common semantic vocabulary is defined, and a set of concept classifiers built on each of these spaces. This allows the mapping of images and texts into a common semantic space. In this space, it is trivial to measure similarities from data of different modalities, e.g. find the text article that best matches a query image, or vice-versa.
Action and event recognition: Video classification requires reasoning in terms of semantic events. Figure 4 illustrates this with an example of video from TV coverage of the Olympic games. At the highest level, the video can be broken down into the events “context shot,” “pole vault,” “triple jump,” and “100 m dash.” Each of these concepts is itself defined in terms of semantic properties of lower level. For example, the “long jump” event is visually simila...
Please report errors in award information by writing to: awardsearch@nsf.gov.