Bash find duplicate files - CentOS 6 - OMB Redmine

Aktionen

Historie

This one-liner is taken from:
http://www.commandlinefu.com/commands/view/3555/find-duplicate-files-based-on-size-first-then-md5-hash

and had been explained at:
http://heyrod.com/snippet/t/linux.html

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d  | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

the explanation is as following:

1 $ find -not -empty -type f -printf "%s\n" | \
2 > sort -rn | \
3 > uniq -d | \
4 > xargs -I{} -n1 find -type f -size {}c -print0 | \
5 > xargs -0 md5sum | \
6 > sort | \
7 > uniq -w32 --all-repeated=separate | \
8 > cut -d" " -f3-

You probably want to pipe that to a file as it runs slowly.

If I understand this correctly:

Line 1 enumerates the real files non-empty by size.
Line 2 sorts the sizes (as numbers of descending size).
Line 3 strips out the lines (sizes) that only appear once.
For each remaining size, line 4 finds all the files of that size.
Line 5 computes the MD5 hash for all the files found in line 4, outputting the MD5 hash and file name. (This is repeated for each set of files of a given size.)
Line 6 sorts that list for easy comparison.
Line 7 compares the first 32 characters of each line (the MD5 hash) to find duplicates.
Line 8 spits out the file name and path part of the matching lines.

Dateien (0)

Von Jeremias Keihsler vor mehr als 7 Jahren aktualisiert · 1 Revisionen

Projekt

Allgemein

Profil

DokuWiki » Infrastructure » Operating System » CentOS 6

Wiki