I would like to ask for a review of the following Bash script which cleans up and datestamps file names, with an optional feature to rename the file.
The main use case is renaming files downloaded from Overleaf, so that Angebot_produkt_kunde-2.pdf (resulting from the second download from the project
without moving the file Angebot_produkt_kunde.pdf out of the way first) gets renamed to Angebot_produkt_kunde-2-2024-06-10_1248.pdf.
If the file name contains a datestamp, it is updated if the string to append is date.
Another feature is appending custom suffixes, as in
angebot_produkt_Müller-Maßnahmen-5 "Neuer Entwurf" -> Angebot-produkt-mueller-massnahmen-5-neuer-entwurf
In this example, the file name has no extension.
I want to make sure that the file remains in the directory it is in and to skip files whose names start with a dot. Files without extension or with more than one extension are handled correctly.
The file names are also cleaned up so they will not cause problems when archived or moved between file systems. Umlauts and spaces are removed, as are leading, trailing, and excessive internal dots or hyphens. Finally, the new name is in lowercase.
Find test cases at the end of this question.
Specific review questions:
Can this script be written more concisely? I am particularly unhappy about
sed -E -e 's/(^-|([\.-]*)$)//g' \ -e 's/-\././g'which is needed to get to the new name
Txt.pdfinstead ofTxt-.pdfwhen called as follows:~/bin/newname.sh --printonly '/tmp/@#$.txt.pdf' '!'Is there an accepted name for the base name of a file, but without any extension? That is, in the name
/tmp/data.csv, the string/tmpis the directory name, the stringdata.csvis the base name, the string.csvis the extension, but what is a good name for the stringdataon its own?Should I rename the option
--printonlyto-d(abbreviating "dry-run")? Is anything off with my handling of options?Have I missed any obvious edge cases for testing? (A non-obvious edge case would be a file name consisting entirely of Hanzi characters, for example. Such a case will not occur in the environment the script is expected to run in.)
What about locales? I am a first-time Cygwin user and am mildly surprised to see
echo "$LC_ALL"not produce any output. Should I replaceA-Za-z0-9with[[:alnum:]], or are there any disadvantages to that?
Test cases:
#!/bin/bash
{
~/bin/newname.sh --printonly Angebot_produkt_kunde_Maßnahmen-6_2024-06-27_1620.pdf date
~/bin/newname.sh --printonly angebot_produkt_Müller-Maßnahmen-4__2024-06-27_1806.txt.pdf date
~/bin/newname.sh --printonly angebot_produkt_Müller-Maßnahmen-5 "Neuer Entwurf"
~/bin/newname.sh --printonly angebot_produkt_Müller-Maßnahmen-3.pdf "Neuer Entwurf"
~/bin/newname.sh --printonly "Angebot 5.pdf" " Neuer Entwurf "
~/bin/newname.sh --printonly Report_kunde_17_06_2024____23_06_2024___LuaLaTeX.pdf date
~/bin/newname.sh --printonly ~/angebot_produkt_Müller-Maßnahmen-3.pdf "Neuer Entwurf"
~/bin/newname.sh --printonly /tmp/files/"Angebot 5.pdf" " Neuer Entwurf "
~/bin/newname.sh --printonly /tmp/files/"Angebot 5%% Rabatt.pdf" " Neuer Entwurf "
~/bin/newname.sh --printonly .configfile date
~/bin/newname.sh --printonly /tmp/.configfile date
~/bin/newname.sh --printonly Angebot_produkt_kunde-2.pdf date
~/bin/newname.sh --printonly '/tmp/@#$%.txt' '!'
~/bin/newname.sh --printonly '/tmp/@#$.txt.pdf' '!'
~/bin/newname.sh --printonly '/tmp/@#$*' '!'
} | sed -e "s/$USER/user/g" | column -s';' -t
Output:
New filename without renaming: Angebot-produkt-kunde-massnahmen-6-2024-06-28-1644.pdf dir=.
New filename without renaming: Angebot-produkt-mueller-massnahmen-4-2024-06-28-1644.txt.pdf dir=.
New filename without renaming: Angebot-produkt-mueller-massnahmen-5-neuer-entwurf dir=.
New filename without renaming: Angebot-produkt-mueller-massnahmen-3-neuer-entwurf.pdf dir=.
New filename without renaming: Angebot-5-neuer-entwurf.pdf dir=.
New filename without renaming: Report-kunde-17-06-2024-23-06-2024-lualatex-2024-06-28-1644.pdf dir=.
New filename without renaming: Angebot-produkt-mueller-massnahmen-3-neuer-entwurf.pdf dir=/cygdrive/c/Users/user
New filename without renaming: Angebot-5-neuer-entwurf.pdf dir=/tmp/files
New filename without renaming: Angebot-5-prozent-rabatt-neuer-entwurf.pdf dir=/tmp/files
Will not rename dotfile .configfile
Will not rename dotfile /tmp/.configfile
New filename without renaming: Angebot-produkt-kunde-2-2024-06-28-1644.pdf dir=.
New filename without renaming: Prozent.txt dir=/tmp
New filename without renaming: Txt.pdf dir=/tmp
New filename without renaming: Wtf dir=/tmp
The script:
#!/bin/bash
# Clean up and datestamp file names; optionally rename the file
# Thure Dührsen, 2024-06-28
# Environment: Cygwin
# Function to modify the base name of the file
clean_name() {
if [ "$#" -ne 1 ]; then
echo "Usage: clean_name <filename>"
exit 1
fi
echo "$1" | sed -E -e 's/(Ä|ä)/AE/g' \
-e 's/(Ö|ö)/OE/g' \
-e 's/(Ü|ü)/UE/g' \
-e 's/ß/SS/g' \
-e 's/%+/-Prozent-/g' \
-e 's/[[:space:]]/-/g' |
tr -dc 'A-Za-z .0-9_-' |
tr '_' '-' |
tr -s '-' |
sed -E -e 's/(^-|([\.-]*)$)//g' \
-e 's/-\././g' |
tr '[:upper:]' '[:lower:]'
}
# Function to append string before the final dot or at the end if no dot exists
append_string_to_filename() {
if [ "$#" -ne 2 ]; then
echo "Usage: append_string_to_filename <filename> <string>"
exit 1
fi
local filename="$(clean_name "$1")"
local append_string="$(clean_name "$2")"
local extension=""
local base_name=""
if [[ "$filename" =~ ([^.]+)(\..*)$ ]]; then
base_name="${BASH_REMATCH[1]}"
extension="${BASH_REMATCH[2]}"
else
base_name="$filename"
fi
local new_base_name="${base_name%.*}"'_'"${append_string}""${extension}"
local clean_base_name="$(clean_name "$new_base_name")"
# Clean_base_name might no longer contain alphanumeric characters
if [[ ! "$clean_base_name" =~ ^[A-Za-z0-9]+([-_A-Za-z.0-9]*[A-Za-z0-9])?$ ]]; then
clean_base_name='wtf'"$extension"
fi
echo "${clean_base_name^}"
}
if [[ "$1" == "--printonly" ]]; then
printonly=true
shift
else
printonly=false
fi
if [[ "$#" -lt 2 || "$#" -gt 3 ]]; then
echo "Usage: $0 [--printonly] <file> string_to_append"
exit 1
fi
if [ "$printonly" == "false" ]
then
if [ ! -e "$1" ]
then
echo "file $1 does not exist"
exit 2
fi
fi
fullpath="$1"
dir="$(dirname "$fullpath")"
file_name="$(basename -- "$fullpath")"
string_to_append="$2"
if [[ "$file_name" =~ ^\. ]]
then
echo "Will not rename dotfile $fullpath"
exit 2
fi
if [[ "$string_to_append" == "date" ]]; then
string_to_append="$(date '+%F_%H%M')"
file_name="$(echo "$file_name" | sed -E -e 's/_[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}_[[:digit:]]{4}//')"
fi
new_file_name="$(append_string_to_filename "$file_name" "$string_to_append")"
if [ "$printonly" == "false" ]
then
if ! mv -n -v "$dir"/"$file_name" "$dir"/"$new_file_name"
then
echo "Failed to rename file $fullpath"
exit 2
fi
else
echo "New filename without renaming: $new_file_name ; dir=$dir"
fi