How to deduplicate 40TB of data?
- by Michael Stauffer
I've inherited a research cluster with ~40TB of data across three filesystems. The data stretches back almost 15 years, and there are most likely a good amount of duplicates as researchers copy each others data for different reasons and then just hang on to the copies.
I know about de-duping tools like fdupes and rmlint. I'm trying to find one…