Determining whether a file is a duplicate
Posted
by Todd R
on Stack Overflow
See other posts from Stack Overflow
or by Todd R
Published on 2010-05-11T17:15:30Z
Indexed on
2010/05/11
17:24 UTC
Read the original article
Hit count: 236
Is there a reliable way to determine whether or not two files are the same? For example, two files with the same size and type may or may not be the same binarilly (yeah, I know it's not really a word). I assume that comparing one or two checksums of the files will help, but I wonder:
- How reliable are checksums at determining whether two files are different; what are the chances of two different files having the same checksum?
- Would reliability increase by applying additional checksum comparisons?
- Which checksum algorithm(s) would be the most efficient and/or reliable?
Any ideas, suggestions or thoughts are appreciated!
P.S. The code for this is being written in Java running on a nix system, but generic or platform agnostic input is most helpful.
© Stack Overflow or respective owner