Find Duplicates Based on Image Content

Post your suggestions, wanted features,... for Wallpaper Cycler here.

Moderators: Marc G, Johan G

Find Duplicates Based on Image Content

Postby theUKdude » Wed May 04, 2005 10:13 am


I was wondering if it was possible to add the facility to search for duplicate images based on actual image content instead of filename?

What I was thinking was to use the same kind of algorithms used in software like Snitch ( which detects adult images by assessment of skin-tone levels. I'm guessing something similar could be created to assess the content of each image and then make comparisons.

If you are able to assess the colour of pixels in, say, a grid pattern of every 100 pixels or so, then you could compare each image for exact matches (and provide thumbnails or something just to double check).

What do you think?
Posts: 9
Joined: Wed May 04, 2005 9:13 am
Location: Bath, UK

Postby Marc G » Wed May 04, 2005 7:58 pm

Yes, that would indeed be an interesting feature and in fact it is already on the todo list and I have already been thinking about it. However, such algorithms aren't that simple. I originally thought of implementing this feature as a similarity based image comparator. This means that similar images are detected, where 'similar' is a user defineable similarity threshold. There are all sorts of things that complicate it though. For example, some wallpaper sites have wallpapers in different resolution: 1280x1024, 640x480, ... Unfortunatly, not all versions of the same wallpaper have the same aspect ratio. For example, a certain wallpaper can be 800x600 which is 4:3 aspect ratio, but the same wallpaper might be available in 1280x1024 which is not 4:3 and which will probably introduce a little black border at the top and bottom of the wallpaper. Detecting the similarity between such wallpapers is already quite a bit more difficult.

Another problem might be the simple fact that wallpapers might be scaled. When a wallpaper is scaled from 1600x1200 to something like 640x480, a lot of color information is lost and the remaining color information is heavily processed with filters to make sure the scaled version looks nice.

I played with the idea of implementing something like a wavelet based image comparator to cope with these problems, but that's not that easy to do. Another solution would be to implement a neural network to do the job which isn't easy either.

Maybe, as an initial version, I can try to implement a pretty basic version of the contents based duplicate finder. The comparator could work on small thumbnails (eg: 240x180) instead of the big wallpapers. The benifits are that it would work faster because it's working on small thumbnails and it partially solves the issue that the original wallpapers might have different resolutions and might be scaled versions of each other.
Of course, this algorithm can't guarantee to find *all* duplicates, but it might do a pretty good job at it.

Also, this algorithm won't be the fasted method to do it, because it's still working on the pixel level of the small thumnails, while solutions using for example wavelet algorithms don't have this problem.

Anyway, I won't dump this feature, but can't promise that it will be available in the next version.
Marc Gregoire,
[ Microsoft MVP VC++ since 2007 ]
User avatar
Marc G
NuonSoft Staff
Posts: 826
Joined: Thu Nov 07, 2002 8:19 pm
Location: Belgium

Re: Find Duplicates Based on Image Content

Postby slvr1969 » Wed Jun 03, 2009 7:10 pm

I am using ImageDupeless for me it is a great proram. I have a 8100 photo set and Find Duplicates Based on Image Content,
Posts: 1
Joined: Wed Jun 03, 2009 7:06 pm

Return to WPC - Wishlist

Who is online

Users browsing this forum: No registered users and 1 guest