AI Colloquium: Marco Wiering: What to do with SIFT?
Event information
Recognizing objects displayed on images is difficul, because of the high dimensionality of the images, and variances in viewpoints, translations, and illuminance levels. Currently, the most widely used method for object recognition is SIFT combined with machine learning algorithms to map the extracted feature vectors to desired class labels. In this presentation we will look at a number of different ways of using SIFT features. First of all, SIFT features can be extracted from many different points in an image and combined in a single feature vector. Second, SIFT features can be clustered to create a bag of visual keywords. Such codebooks can then be used to encode a new image using a histogram of winning clusters. This approach can be improved by using a soft assignment from image patches to the cluster centroids, and we will discuss two of such approaches. Finally, max similarity pooling can be used to compute the maximum activation of a cluster centroid in a new image, which is inspired by the Hmax architecture.
All these techniques can also be combined with a spatial pyramid approach where the feature vectors are computed in different areas of the image, after which they are combined. Since we deal with a number of different visual descriptors, we will combine the extracted features with product and averaging ensembles. Experiments are performed on 10 and 101 images from the Caltech database and show that our SIFT ensembles obtain state-of-the-art performance levels.