Finding Duplicate Files in a Directory Tree
September 11, 2023 —
Gregg Szumowski
Sometimes I need to find all of the duplicate files in a directory tree. I have this issue all of the time when I move my music collection. Here is a nifty script to sort these things out:
#!/bin/bash
find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 -D --all-repeated=separate
If you pipe the output of the above into a text file, for example,
duplicates.txt
, you can then create a script from that:
awk '{$1="";printf("rm \"%s\"\n",$0);}' ~/duplicates.txt >~/duplicates.sh
Then edit the file and remove the lines for the files you want to keep, make the script executable and run it. Done.
Tags: cli, duplicates, find, awk, motd