Removing duplicate images (deduplication) - calculating "overlap" of images
- by jotango
Hello,
I have a ton of product images on our file system. Our code removes 100% identical images (or does not allow them to be uploaded). However our sellers often upload items pictures which are very similar, but not exactly. They could have more whitespace, a worse quality (compression), a different size etc.
Is there any way I can calculate the degree of overlap between two images, to flag ones for deletion? Kind of like a Levenshtein distance between two images...
Any pointers would be very cool. Thanks!