Removing duplicate images (deduplication) - calculating "overlap" of images
Posted
by jotango
on Server Fault
See other posts from Server Fault
or by jotango
Published on 2010-05-06T09:33:26Z
Indexed on
2010/05/06
9:38 UTC
Read the original article
Hit count: 798
Hello,
I have a ton of product images on our file system. Our code removes 100% identical images (or does not allow them to be uploaded). However our sellers often upload items pictures which are very similar, but not exactly. They could have more whitespace, a worse quality (compression), a different size etc.
Is there any way I can calculate the degree of overlap between two images, to flag ones for deletion? Kind of like a Levenshtein distance between two images...
Any pointers would be very cool. Thanks!
© Server Fault or respective owner