Award Abstract # 0830535
Large-vocabulary Semantic Image Processing: Theory and Algorithms

NSF Org: CCF
Division of Computing and Communication Foundations
Recipient: UNIVERSITY OF CALIFORNIA, SAN DIEGO
Initial Amendment Date: July 17, 2008
Latest Amendment Date: September 13, 2013
Award Number: 0830535
Award Instrument: Standard Grant
Program Manager: John Cozzens
CCF
 Division of Computing and Communication Foundations
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: September 1, 2008
End Date: December 31, 2015 (Estimated)
Total Intended Award Amount: $239,499.00
Total Awarded Amount to Date: $1,860,696.00
Funds Obligated to Date: FY 2008 = $239,499.00
FY 2009 = $402,062.00

FY 2010 = $206,139.00

FY 2012 = $778,771.00

FY 2013 = $234,225.00
History of Investigator:
  • Nuno Vasconcelos (Principal Investigator)
    nuno@ece.ucsd.edu
Recipient Sponsored Research Office: University of California-San Diego
9500 GILMAN DR
LA JOLLA
CA  US  92093-0021
(858)534-4896
Sponsor Congressional District: 50
Primary Place of Performance: University of California-San Diego
9500 GILMAN DR
LA JOLLA
CA  US  92093-0021
Primary Place of Performance
Congressional District:
50
Unique Entity Identifier (UEI): UYTTZT6G9DT1
Parent UEI:
NSF Program(s): SIGNAL PROCESSING SYS PROGRAM
Primary Program Source: 01000809DB NSF RESEARCH & RELATED ACTIVIT
01000910RB NSF RESEARCH & RELATED ACTIVIT

01001011RB NSF RESEARCH & RELATED ACTIVIT

01001213RB NSF RESEARCH & RELATED ACTIVIT

01001314RB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 9218, 7925, 7936, 7797, HPCC, 4720
Program Element Code(s): 472000
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Classical image processing has mostly disregarded semantic image representations, in favor of more mathematically tractable representations based on low?]level signal properties (frequency decompositions, mean squared error, etc.). This is unlike biological solutions to image processing
problems, which rely extensively on understanding of scene content. For example, regions of faces are usually processed more carefully than the bushes in the background. The inability to tune image processing to the semantic relevance of image content frequently leads to the sub?]optimal allocation of
resources, such as bandwidth, error protection, or viewing time, to image areas that are perceptually irrelevant. One of the main obstacles to the deployment of semantic image processing systems has been the difficulty of training content?]understanding systems with large scale vocabularies. This is, in great part, due to the requirement for large amounts of training data and intensive human supervision associated with the classical methods for vocabulary learning. This research aims to establish a foundation for semantic image processing systems that can learn large scale vocabularies from
informally annotated data and no additional human supervision. It builds on recent advances in semantic image labeling, which have made it possible to learn vocabularies from noisy training data, such as that massively (and inexpensively) available on the web. The research studies both theoretical
issues in vocabulary learning, and the design of image processing algorithms that tune their behavior according to the content of the images being processed. Semantic image processing could lead to transformative advances in areas such as image compression, enhancement, encryption, de?]noising, or
segmentation, among others, which are of interest for applications as diverse as medical imaging, image search and retrieval, or security and surveillance.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 26)
D. Gao and N. Vasconcelos "Decision-theoretic saliency: computational principles, biological plausibility, and implications for neurophysiology and psychophysics" Neural Computation , v.21 , 2009 , p.239
D. Gao, S. Han, N. Vasconcelos "Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition" IEEE Transactions on Pattern Analysis and Machine Intelligence , v.31 , 2009 , p.989
E. Coviello, A.B. Chan and G. Lanckriet "Clustering Hidden Markov Models with Variational HEM" Journal of Machine Learning Research , v.15 , 2014 , p.655
Hamed Masnadi-Shirazi, Nuno Vasconcelos "A View of Margin Losses as Regularizers of Probability Estimates" Journal of Machine Learning Research , 2015
J. Costa Pereira and N. Vasconcelos "Cross-modal Domain Adaptation for Text-based Regularization of Image Semantics in Image Retrieval Systems" Computer Vision and Image Understanding , v.124 , 2014 , p.123
J. Costa Pereira, E. Coviello, G. Doyle, N. Rasiwasia, G. Lanckriet, R.Levy and N. Vasconcelos "On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval" IEEE Transactions on Pattern Analysis and Machine Intelligence , v.36 , 2014 , p.521
J. Costa Pereira, N. Vasconcelos "On the Regularization of Image Semantics by Modal Expansion" IEEE Conference on Computer Vision and Pattern Recognition , 2012
K. Ellis, E. Coviello, A. Chan and G. Lanckriet "A Bag of Systems Representation for Music Auto-tagging" IEEE Transactions on Audio, Speech and Language Processing , v.21 , 2013 , p.2554
M. Dixit, N. Rasiwasia and N. Vasconcelos "Adapted Gaussian Models for Image Classification" IEEE Conference on Computer Vision and Pattern Recognition , 2011
M.Dixit, S. Chen, D. Gao, N. Rasiwasia and N. Vasconcelos "Scene Classification with Semantic Fisher Vectors" IEEE Conference on Computer Vision and Pattern Recognition , 2015
M. Vasconcelos and N. Vasconcelos "Natural Image Statistics and Low-complexity Feature Selection" IEEE Trans. on Pattern Analysis and Machine Intelligence , v.32 , 2009 , p.228
(Showing: 1 - 10 of 26)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project produced a number of contributions in the semantic representation of images. This is a novel image representation, which replaces the space of image pixels by a semantic space, where each dimension represents a visual concept. A vocabulary of visual concepts (objects, object attributes, scene classes, etc.) is first defined and a classifier is learned to detect the different concepts in the image. The image is finally represented by the vector of probabilities of containing the concepts in the vocabulary. This is known as the semantic multinomial (SMN) descriptor of the image. The process is illustrated in Figure 1.

The project showed that this  representation is superior to the classical representation of the image by pixels, or features derived from those pixels, for a number of image processing operations.

Image retrieval: This addresses the search for images within an image database. The standard paradigm is query-by-visual-example (QBVE), where a vector of features is derived from each image and used to measure image similarity. Database images are then ranked by similarity to a query image. The project has introduced the query-by-semantic –example (QBSE) paradigm, where the feature vector is replaced by the SMN descriptor. This corresponds to measuring similarity between images in the semantic space. Since this space has a higher level of abstraction than that of classical image features (colors, textures, etc.) it enables retrieval systems with higher generalization ability and robustness.  This is illustrated in Figure 2, where we compare the images retrieved  by QBSE and QBVE, for a common image query. For QBSE, we also show the concepts of highest probability for each image.  Note that the classic QBVE approach tends to return images with similar colors and textures and does not work well for complicated scenes.  On the other hand, the QBSE approach now proposed returns images that contain similar concepts, e.g. peoples and  buildings, to those in the query. This allows the retrieval operation to succeed even when these concepts appear in different sizes, positions, colors, etc. In result, the retrieval operation mimics the similarity judgments of people much more closely than under QBVE.

Cross-modal retrieval: One difficult problem in multimedia is to bring together information from multiple modalities. For example, how does one search an image database for the best match to a query text. The semantic representation provides a universal solution to this problem. Since, the representation is based on concept probabilities, not pixels, sounds, or characters, it provides an abstract space where information of multiple modalities can be equally projected. After all, concept classifiers can be built for images, sound, speech, music, or text with nearly equal difficulty. This is illustrated in Figure 3, for an example involving image and text data. Images and texts are initially represented in an image and text space, respectively. A common semantic vocabulary is defined, and a set of concept classifiers built on each of these spaces. This allows the mapping of images and texts  into a common semantic space. In this space, it is trivial to measure similarities from data of different modalities, e.g. find the text article that best matches a query image, or vice-versa.

Action and event recognition: Video classification requires reasoning in terms of semantic events. Figure 4 illustrates this with an example of video from TV coverage of the Olympic games. At the highest level, the video can be broken down into the events “context shot,” “pole vault,” “triple jump,” and “100 m dash.” Each of these concepts is itself defined in terms of semantic properties of lower level. For example, the “long jump” event is visually simila...

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page