Sunday, December 16, 2007

Image Matching

I had been toying around with the idea on extending Video search to include the audio and video (image) data associated with the video in addition to the user specified tags. This is one of my weekend projects, others being the semantic representation of knowledge and android app development.
This weekend I spent some of my time get out a rough cut of the "Image Indexing using Color Correlograms" [the same with my notes] which could be helpful in my Video Search project. The principle behind an image correlogram [Correloation + histogram] is the spatial correlation between the image colors. A correlogram can be defined as f(c1,c2,k), where f is the number of pixels of color c2 around a pixel of color c1 which are at a distance of k from the c1 colored pixel.
The results looked promising. At present I used only the G channel to generate the correlogram, and I would be refining it over time.
Some results : I chose come images obtained by the query: road via google image search, calculate the correlograms for each and then used the metric described in the paper to find the closeness. The number adjacent to each image describes how close it is to Image 1. Lower the number the closer the image is to Image1

Image 1:





Image 2: 16.32
Image 3: 4.84 Image 4: 9.51

Image 5: 15.06 Image 6: 17.03Image 7: 15.05

The closest images to Image 1 are Image 3 and Image 4, which looks intuitive.
Image 5 and Image 7 have similar correlograms which also seems fine.
But the observation that Images 2,5,6 and 7 are at almost equidistant to Image 1 is not very palpable. Especially as Image 2 is no way related to Image 1. On the other hand, it is not possible to draw inferences on the robustness of this approach with such a small set of test images. I'd be do some more research and come up with enhancements to get better results.

At present I just used the green channel of the image in the creation of the correlogram as histogram of the green channel is closet to the luminosity histogram. [ The reason for this is that the human eye is more sensitive to the color green than any other colors]. The current algorithm calculates f using all the pixels in an image and computing f is quite expensive O(n^2*d) . [The complexity has been mentioned to be O(n^2*d^2) in the paper and I will try to clarify this with the authors]. The space complexity is O(m^2 * d) [where m is the number of possible colors] Further improvements in time and space can be made by calculating the correlogram for select regions of the image (high gradient blocks).

btw, it took me quite some time to set up a development environment in windows.
I used the opencv sdk for reading the images, eclipse IDE with CDT, and mingw (gcc for windows)for the compiler.


Some Informative links that I followed:
opencv
http://www.site.uottawa.ca/~laganier/tutorial/opencv+directshow/cvision.htm
http://www.cs.iit.edu/~agam/cs512/lect-notes/opencv-intro/opencv-intro.html#SECTION00041000000000000000
http://www.xpercept.com/opencv.htm

eclipse cdt:
http://www.cs.umanitoba.ca/~eclipse/7-EclipseCDT.pdf

updates:
Don Dodge on video search