Gregg's MOTD

Tips & Tricks that I've Encountered Over the Years...

Finding Duplicate Files in a Directory Tree

September 11, 2023 — Gregg Szumowski

Sometimes I need to find all of the duplicate files in a directory tree. I have this issue all of the time when I move my music collection. Here is a nifty script to sort these things out:

#!/bin/bash
find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 -D --all-repeated=separate

If you pipe the output of the above into a text file, for example, duplicates.txt, you can then create a script from that:

awk '{$1="";printf("rm \"%s\"\n",$0);}' ~/duplicates.txt >~/duplicates.sh

Then edit the file and remove the lines for the files you want to keep, make the script executable and run it. Done.

Tags: cli, duplicates, find, awk, motd

Printing Numbers using Thousand Separators

June 30, 2023 — Gregg Szumowski

You can use a pipe to awk to output numbers with thousands separators (commas). For Example, here’s how you can total the 5th column of the ls -l command and print it with thousands separators:

$ ls -l | awk '{total = total + $5}END{print total}' | LC_ALL=en_US.UTF-8 awk '{printf("%'"'"'d\n", $0) }'
21,387

This can be adapted to other commands as necessary.

Tags: cli, bash, awk, motd

Archive Only Files In a Directory

May 29, 2023 — Gregg Szumowski

If you want to create a tar archive of only the files of a directory and exclude any subdirectories you can use the ls -la command and pipe the output to awk. However you need to remove the first 8 fields from the output and leave all of the remaining parts of the line in case there are spaces in the filename. One quick and dirty way of doing that is to set each of the 8 fields to a blank and then use sed to trim the leading spaces. You can optionally add quotation marks around the filename in your output too.

$ ls -al | awk '$0!~/^d/ {$1=$2=$3=$4=$5=$6=$7=$8=""; printf("%s\"\n", $0)}' | sed 's/^[[:space:]]*/"/' | xargs tar cvf archive-name.tar

Tags: cli, tar, awk, xargs, sed, motd

Echo File Until a Blank Line Is Reached

May 27, 2023 — Gregg Szumowski

You can use the awk program to search and print lines in a file. If you wanted to print a file until the first blank line is reached you can use the following command to do that:

awk '$0 ~ /^$/ {exit;} {print $0;}' somefile.txt

Tags: awk, cli, motd

How To Find All of the Shell Scripts In a Directory

May 21, 2023 — Gregg Szumowski

This is a quick and dirty way which will list all of the files that are shell scripts:

for i in *
do
type=$(file ${i}|awk -F, '{print $2}')
if [[ "${type}" = " ASCII text executable" ]]; then
echo "${i} is a shell script"
fi
done

Tags: cli, motd, awk