Remove Duplicate Messages from Maildir

Posted by Joseph Holsten on Server Fault See other posts from Server Fault or by Joseph Holsten
Published on 2011-04-04T21:53:26Z Indexed on 2012/11/17 17:04 UTC
Read the original article Hit count: 213

Filed under:

I've got a bunch of duplicate messages in my IMAP server's Maildir. What's the best way to remove them?

Some relevant points:

  • Shared Message-ID is usually a good enough definition of duplicate. A tiny script that removes all but one of the duplicate messages would work.
  • Sometimes it's necessary to find duplicates based on shared message bodies. What's a reasonable definition of shared here? Bitwise equivalent? What about weird differences in line wrapping, escaping, character encoding?
  • Sometimes there's some meaningful difference between 'duplicate' messages. What's the best way to review the differences in sets of 'duplicate' messages? Diffs?

© Server Fault or respective owner

Related posts about maildir