Image comparison tool

Discussion in 'Digital Photography' started by bcutting@gmail.com, Jul 21, 2006.

  1. Guest

    I am looking for a way to take a large number of images and find
    matches among them. These images may not be exact replicas. Images
    may have been resized, cropped, faded, color corrected, etc.

    Approach 1
    Programmatically extract the information (such as Eigen Vectors/Eigen
    Spaces) and store them in a database. Then apply a comparison
    algorithm between the database entries to find like images.

    Approach 2
    Store all the images and run the comparison tool against the individual
    images

    Approach 1 is preferred since I believe the comparison could be
    performed much more rapidly once the comparison information has been
    extracted. However, this requires a library capable of independent
    extraction and comparison.

    Does anybody have and suggestions for a dll library that can perform
    the above stated tasks?

    Any suggestions on how to store the information in the database? More
    specifically, what would the schema look like.

    Any help is appreciated
    , Jul 21, 2006
    #1
    1. Advertising

  2. Nils Guest

    If you want to extract shape info or some other kind of metric, you must
    first know what exactly. E.g. for face recognition, eigenvalues are used,
    for other types of recognition feature points are used, etc.

    If you have no a-priori info, you can still compare the thumbnails (create
    mini-thumbnails of the same size for each image), and use something like a
    Hamming distance to find the matches. If you convert them all to the same
    normalized grayscale images, you can also detect slight colour mismatches.

    I wrote exactly this quite a few years back, and it is still available in
    the form of a shareware image browser having a special "Similar Images"
    filter. It works quite well if one wants to find similar images in large
    databases (up to e.g. 20.000 files).

    Info here:
    http://www.abc-view.com/articles/article3.html

    I termed the information I store "image metrics", well they are nothing more
    than a smart wavelet-like way of storing the minithumbnails. Since the
    metrics information is small (couple of hundred bytes each), they can be
    kept in memory which speeds up the comparison process enormously.

    The procedure to find duplicates consists of:
    1. Calculate an image metric for each image
    2. Compare the list, using hamming distance with a smart similarity sorter
    3. Output the list to a thumbnail viewer, sorted by similarity, showing only
    similar images as colour-coded groups.

    #1 can take quite some time, but the image database can be stored, so this
    only needs to be done once.

    Hope that helps,

    Nils Haeck
    www.simdesign.nl


    <> schreef in bericht
    news:...
    >I am looking for a way to take a large number of images and find
    > matches among them. These images may not be exact replicas. Images
    > may have been resized, cropped, faded, color corrected, etc.
    >
    > Approach 1
    > Programmatically extract the information (such as Eigen Vectors/Eigen
    > Spaces) and store them in a database. Then apply a comparison
    > algorithm between the database entries to find like images.
    >
    > Approach 2
    > Store all the images and run the comparison tool against the individual
    > images
    >
    > Approach 1 is preferred since I believe the comparison could be
    > performed much more rapidly once the comparison information has been
    > extracted. However, this requires a library capable of independent
    > extraction and comparison.
    >
    > Does anybody have and suggestions for a dll library that can perform
    > the above stated tasks?
    >
    > Any suggestions on how to store the information in the database? More
    > specifically, what would the schema look like.
    >
    > Any help is appreciated
    >
    Nils, Jul 21, 2006
    #2
    1. Advertising

  3. Bob Guest

    Nils

    That sounds interesting. Have you published anything on the algorithm
    for computing the similarity metric?

    Bob


    Nils wrote:
    > I wrote exactly this quite a few years back, and it is still available in
    > the form of a shareware image browser having a special "Similar Images"
    > filter. It works quite well if one wants to find similar images in large
    > databases (up to e.g. 20.000 files).
    >
    > Info here:
    > http://www.abc-view.com/articles/article3.html
    Bob, Jul 22, 2006
    #3
  4. Guest

    Thumbs Plus (shareware, or it used to be) has done this for years.
    It's reasonably effective, and maybe they will share their secrets..
    , Jul 22, 2006
    #4
  5. Nils Guest

    Hi Bob,

    No I haven't published anything on the algorithm except for the brief
    description on how the software works on the webpage mentioned. However,
    it's not rocket science :)

    People that have to do a comparison can simply try out the software (30-use
    functional trial, sales price $29). Software engineers/developers wanting to
    make use of the software can always buy the source code from my company. I
    have sold it already to a few companies creating image cataloguers and image
    processing software. I think anyone could write such a thing themself,
    however it might make sense to buy it to save yourself a few weeks of work.

    Here is the basic idea with these assumptions:

    a) We only compare the grayscale version
    b) We are not interested in aspect ratio

    1. Start with an image of dimensions WxH
    2. Scale down into thumbnail of 16x16 pixels, only grayscale, 256 levels
    3. Normalize the thumbnail (so it contains values 0..255 instead of eg.
    25..230)
    4. Create a subthumbnail of 8x8, 4x4, 2x2 and 1x1
    5. Store these thumbnails such that 1x1 comes first, then 2x2, etc

    Now the comparison. Realise that when comparing two images, we are only
    interested in images that are close. So if there's a big difference between
    them, we can abort the comparison quite soon.

    Comparing two images with metrics A and B, metric consisting of

    A = {a1..aN}, N = 341 (1 + 4 + 16 + 64 + 256 = 341)
    B = {b1..bN}, N = 341

    Weighting: w1..wN, where

    w1 = 256
    w2 ..w5 = 64
    w6 ..w21 = 16
    w21..w85 = 4
    w21..w277 = 1

    comparison value between A and B is

    Cp = sum_i,i=1..p{max(0, abs(ai - bi) - 1)) * wi}, p can be 1..N

    Note the term max(0, abs(ai - bi) - 1): We compare two pixels, and use the
    "difference - 1", because often a difference of 1 occurs through
    resampling/normalization.

    When comparing two metrics we define a threshold T, so we can stop
    comparison if Cp > T, and just store the value Cp (p <= N) up to that point.
    If T is low enough we often can stop comparison after just comparing one
    byte!

    Now.. when comparing a large list of metrics, one can simply sort them, then
    take one as start S, put a sliding window {-T/256, T/256} on the sorted list
    around S and compare all the metrics in that group with S, to find any
    matching metrics to S.

    this way we can build a new list, beginning with S, then the one closest
    matching that one, then find again the closest match to this one, etc. Each
    time we remove the metric from the original list. In the end we have a
    sorted list of images, by similarity.

    The algorithm is still O(N^2) but nevertheless cuts out a large portion of
    work compared to the full N^2 algorithm.

    There are some specialities not mentioned here (for colours, for aspect
    ratio, etc), but this is the general principle.

    Note: I looked into a lot of different techniques (Fourier-transformations,
    Gabor wavelets, feature point extraction, etc) but more complex is not
    always better. In this case, simplicity seems to favour.

    Hope that helps,

    Nils Haeck
    www.simdesign.nl


    "Bob" <> schreef in bericht
    news:...
    > Nils
    >
    > That sounds interesting. Have you published anything on the algorithm
    > for computing the similarity metric?
    >
    > Bob
    >
    >
    > Nils wrote:
    >> I wrote exactly this quite a few years back, and it is still available in
    >> the form of a shareware image browser having a special "Similar Images"
    >> filter. It works quite well if one wants to find similar images in large
    >> databases (up to e.g. 20.000 files).
    >>
    >> Info here:
    >> http://www.abc-view.com/articles/article3.html

    >
    Nils, Jul 22, 2006
    #5
  6. Bob Guest

    Hi Nils

    Yes that does help. Thanks for the explanation.

    Bob
    Bob, Jul 22, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hugh jass

    folder comparison tool?

    hugh jass, May 5, 2004, in forum: Computer Support
    Replies:
    2
    Views:
    453
    Plato
    May 6, 2004
  2. Trevor Smithson
    Replies:
    5
    Views:
    998
    Blinky the Shark
    May 25, 2005
  3. Trevor Smithson
    Replies:
    1
    Views:
    574
  4. siliconpi
    Replies:
    2
    Views:
    787
    Don B
    Nov 29, 2004
  5. rapee
    Replies:
    0
    Views:
    704
    rapee
    Mar 14, 2008
Loading...

Share This Page