Finding Duplicate Files in a Directory Tree
Sometimes I need to find all of the duplicate files in a directory tree. I have this issue all of the time when I move my music collection. Here is a nifty script to sort these things out:
#!/bin/bash
find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 -D --all-repeated=separate
If you pipe the output of the above into a text file, for example,
duplicates.txt
, you can then create a script from that:
awk '{$1="";printf("rm \"%s\"\n",$0);}' ~/duplicates.txt >~/duplicates.sh
Then edit the file and remove the lines for the files you want to keep, make the script executable and run it. Done.
Tags: cli, duplicates, find, awk, motd
Printing Numbers using Thousand Separators
You can use a pipe to awk
to output numbers with
thousands separators (commas). For Example, here’s how you can total the
5th column of the ls -l
command and print it with thousands
separators:
$ ls -l | awk '{total = total + $5}END{print total}' | LC_ALL=en_US.UTF-8 awk '{printf("%'"'"'d\n", $0) }'
21,387
This can be adapted to other commands as necessary.
Archive Only Files In a Directory
If you want to create a tar archive of only the files of a directory and exclude any subdirectories you can use the ls -la
command and pipe the output to awk
. However you need to remove the first 8 fields from the output and leave all of the remaining parts of the line in case there are spaces in the filename. One quick and dirty way of doing that is to set each of the 8 fields to a blank and then use sed
to trim the leading spaces. You can optionally add quotation marks around the filename in your output too.
$ ls -al | awk '$0!~/^d/ {$1=$2=$3=$4=$5=$6=$7=$8=""; printf("%s\"\n", $0)}' | sed 's/^[[:space:]]*/"/' | xargs tar cvf archive-name.tar
Tags: cli, tar, awk, xargs, sed, motd
Echo File Until a Blank Line Is Reached
You can use the awk
program to search and print lines in a file. If you wanted to print a file until the first blank line is reached you can use the following command to do that:
awk '$0 ~ /^$/ {exit;} {print $0;}' somefile.txt
How To Find All of the Shell Scripts In a Directory
This is a quick and dirty way which will list all of the files that are shell scripts:
for i in *
do
type=$(file ${i}|awk -F, '{print $2}')
if [[ "${type}" = " ASCII text executable" ]]; then
echo "${i} is a shell script"
fi
done