Gregg's MOTD

Tips & Tricks that I've Encountered Over the Years...

Finding Duplicate Files in a Directory Tree

September 11, 2023 — Gregg Szumowski

Sometimes I need to find all of the duplicate files in a directory tree. I have this issue all of the time when I move my music collection. Here is a nifty script to sort these things out:

#!/bin/bash
find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 -D --all-repeated=separate

If you pipe the output of the above into a text file, for example, duplicates.txt, you can then create a script from that:

awk '{$1="";printf("rm \"%s\"\n",$0);}' ~/duplicates.txt >~/duplicates.sh

Then edit the file and remove the lines for the files you want to keep, make the script executable and run it. Done.

Tags: cli, duplicates, find, awk, motd