Find all duplicate files by md5 hash

Posted by Jamie Curran on Super User See other posts from Super User or by Jamie Curran
Published on 2012-10-14T21:31:33Z Indexed on 2012/10/15 3:42 UTC
Read the original article Hit count: 523

Filed under:

linux

|

sysadmin

I'm trying to find all duplicate files based upon md5 hash and ordered by file size. So far I have this:

 find . -type f -print0 | xargs -0 -I "{}" sh -c 'md5sum "{}" |  cut -f1 -d " " | tr "\n" " "; du -h "{}"' | sort -h -k2 -r | uniq -w32 --all-repeated=separate

The output of this is:

1832348bb0c3b0b8a637a3eaf13d9f22 4.0K   ./picture.sh
1832348bb0c3b0b8a637a3eaf13d9f22 4.0K   ./picture2.sh
1832348bb0c3b0b8a637a3eaf13d9f22 4.0K   ./picture2.s

d41d8cd98f00b204e9800998ecf8427e 0      ./test(1).log

Is this the most efficient way?

© Super User or respective owner

Related posts about linux

apt-get install and update fail

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I've got a problem with apt-get update and apt-get install ... commands . every time update or installing fails and errors are : Get:1 http://dl.google.com stable Release.gpg [198B] Ign http://dl.google.com/linux/chrome/deb/ stable/main Translation-en_US Get:2 http://dl… >>> More
kernel module compiling error

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
sh@ubuntu:/home/ccpp/helloworld$ make gcc-4.6 -O2 -DMODULE -D_KERNEL_ -W -Wall -Wstrict-prototypes -Wmissing-prototypes -isystem /lib/modules/`uname -r`/build/include -c -o hello-1.o hello-1.c hello-1.c:4:0: warning: "MODULE" redefined [enabled by default] <command-line>:0:0: note: this is… >>> More
Build-Essentials installation failing

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I am having trouble accessing the several critical header files that show to be a part of the build process. The "Ubuntu Software Center" shows "Build Essentials" as installed: Next I did the following two commands, which did not improve the problem: ~$ sudo apt-get install build-essential [sudo]… >>> More
Updating Debian kernel

as seen on Super User - Search for 'Super User'
I'm trying to update my Debian machine to 2.6.32-46 (which is the new stable). However, after doing apt-get update my apt-cache search linux-image shows me: linux-headers-2.6.32-5-486 - Header files for Linux 2.6.32-5-486 linux-headers-2.6.32-5-686-bigmem - Header files for Linux 2.6.32-5-686-bigmem linux-headers-2… >>> More
Serial connection over a single USB cable (Windows to linux, or linux to linux)

as seen on Server Fault - Search for 'Server Fault'
I'm helping out with a project for an embedded device that only has USB and no serial. This device is running Linux. These days, when we need to connect to a serial port on a device we typically use a USB to serial adapter (on something like a phone system or a load balancing device, etc). I would… >>> More

Related posts about sysadmin

SysAdmin Career Question: Internal or Client Based

as seen on Server Fault - Search for 'Server Fault'
ServerFault Community, It seems there are two positions SysAdmins find themselves in, either you are working for a non-IT services based single client (your employer) and providing in-house IT support or you work for a company who provides out sourced IT services to multiple clients. Right now… >>> More
UNIX User Account to Restricted SysAdmin (User/Printer Admin only)

as seen on Server Fault - Search for 'Server Fault'
Hi all, I'd like to know if there is a way for a user account to be enabled or elevated to carry out system admin tasks WITHOUT having to use the root account or sudo. Goal here is to allow a user account to Add/Delete users/printers without giving them the 'God' powers that the root account carries… >>> More
How to grow to be global sysadmin of an organization?

as seen on Server Fault - Search for 'Server Fault'
Bit of a non-technical question but I have seen questions of the career development type on here before so hopefully it is fine. I work for a fast growing but still small organization (~65 employees). I have been their external sysadmin for a while now, looking after hosted Linux servers and infrastructure… >>> More
Configuring UCM cache to check for external Content Server changes

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Recently, I was involved in a customer scenario where they were modifying the Content Server's contributor data files directly through Content Server. This operation of course is completely supported. However, since the contributor data file was modified through the "backdoor", a running… >>> More
So you're a new sysadmin...

as seen on Server Fault - Search for 'Server Fault'
I've recently taken over management of a Windows 2003 Small Business server and network for a small, less than ten person company. I have some (antiquated) sysadmin experience, but I've little experience with Exchange. The documentation of the existing infrastructure leaves much to be desired, and… >>> More