Finding missing files by checksum

Posted by grw on Stack Overflow See other posts from Stack Overflow or by grw
Published on 2010-06-09T13:05:49Z Indexed on 2010/06/09 13:22 UTC
Read the original article Hit count: 212

Filed under:
|
|
|

Hi there, I'm doing a large data migration between two file systems (let's call them F1 and F2) on a Linux system which will necessarily involve copying the data verbatim into a differently-structured hierarchy on F2 and changing the file names.

I'd like to write a script to generate a list of files which are in F1 but not in F2, i.e. the ones which weren't copied by the migration script into the new hierarchy, so that I can go back and migrate them manually. Unfortunately for reasons not worth going into, the migration script can't be modified to list files that it doesn't migrate. My question differs from this previously answered one because of the fact that I cannot rely on filenames as a comparison.

I know the basic outline of the process would be:

  1. Generate a list of checksums for all files, recursing through F1
  2. Do the same for F2
  3. Compare the lists and generate a negative intersection of the checksums, ignoring the file names, to find files which are in F1 but not in F2.

I'm kind of stuck getting past that stage, so I'd appreciate any pointers on which tools to use. I think I need to use the 'comm' command to compare the list of file checksums, but since md5sum, sha512sum and the like put the file name next to the checksum, I can't see a way to get it to bring me a useful comparison. Maybe awk is the way to go?

I'm using Red Hat Enterprise Linux 5.x.

Thanks.

© Stack Overflow or respective owner

Related posts about linux

Related posts about shell