1
#!/usr/bin/env bash

FILETYPES=( "*.html" "*.css" "*.js" "*.xml" "*.json" )
DIRECTORIES=`pwd`
MIN_SIZE=1024

for currentdir in $DIRECTORIES
do
   for i in "${FILETYPES[@]}"
   do
      find $currentdir -iname "$i" -exec bash -c 'PLAINFILE={};GZIPPEDFILE={}.gz; \
         if [ -e $GZIPPEDFILE ]; \
         then if [ `stat --printf=%Y $PLAINFILE` -gt `stat --printf=%Y $GZIPPEDFILE` ]; \
                then    gzip -k -4 -f -c $PLAINFILE > $GZIPPEDFILE; \
                 fi; \
         elif [ `stat --printf=%s $PLAINFILE` -gt $MIN_SIZE ]; \
            then gzip -k -4 -c $PLAINFILE > $GZIPPEDFILE; \
         fi' \;
  done
done

This script compresses all web static files using gzip. When I try to run it, I get this error bash: line 5: [: 93107: unary operator expected. What is going wrong in this script?

2
  • find "$currentdir", not find $currentdir. "$(foo)" -gt "$MIN_SIZE", not $(foo) -gt $MIN_SIZE (and $() in place of backticks if you want saner nesting behavior). "$PLAINFILE", not $PLAINFILE. And, yes, you need to export any variables you want subprocesses to see. Commented Jul 11, 2014 at 2:35
  • 1
    Backing up -- why not let find decide whether the file is larger than your minimum or not? There's no need to use a shell for that. Commented Jul 11, 2014 at 2:40

2 Answers 2

3

You need to export the MIN_SIZE variable. The bash you are having find spawn doesn't have a value for it so the script runs (as I just mentioned in my comment on @ooga's answer) [ $result_from_stat -gt ] which is an error and (when the result is 93107) gets you [ 93107 -gt ] which (if you run that in your shell) gets you output of:

$ [ 93107 -gt ]
-bash: [: 93107: unary operator expected
Sign up to request clarification or add additional context in comments.

8 Comments

The other thing is using quotes. -gt "$MIN_SIZE" would prevent missing data turning into a syntax error.
@CharlesDuffy With -gt that wouldn't actually help though. Because [ 93107 -gt "" ] is still a syntax error. (Which I'd argue is a good thing because silently comparing against zero instead of the expected value is a bad failure mode though apparently that is what [[ does.)
It's an error, yes, but not a syntax error. Having the syntax be correct means a more informative message (about the empty string not being a number) is possible.
@CharlesDuffy Fair enough. The type of error does change. How much that helps figuring out the problem in this sort of case I'm not sure though. You still have to figure out that you are getting an empty variable expansion but perhaps it might be helpful to someone. None of this makes the [[ behaviour make sense to me though.
"Integer expression expected" is, IMHO, vastly clearer than "unary operator expected", so my expectation is that the level of help is rather significant.
|
2

This could be simpler:

#!/usr/bin/env bash

FILETYPES=(html css js xml json)
DIRECTORIES=("$PWD")
MIN_SIZE=1024

IFS='|' eval 'FILTER="^.*[.](${FILETYPES[*]})\$"'

for DIR in "${DIRECTORIES[@]}"; do
    while IFS= read -ru 4 FILE; do
        GZ_FILE=$FILE.gz
        if [[ -e $GZ_FILE ]]; then
            [[ $GZ_FILE -ot "$FILE" ]] && gzip -k -4 -c "$FILE" > "$GZ_FILE"
        elif [[ $(exec stat -c '%s' "$FILE") -ge MIN_SIZE ]]; then
            gzip -k -4 -c "$FILE" > "$GZ_FILE"
        fi
    done 4< <(exec find "$DIR" -mindepth 1 -type f -regextype egrep -iregex "$FILTER")
done
  • There's no need to use pwd. You can just have $PWD. And probably what you needed was an array variable as well.
  • Instead of calling bash multiple times as an argument to find with static string commands, just read input from a pipe or better yet from a named pipe through process substitution.
  • Instead of comparing stats, you can just use -ot or -nt.
  • You don't need -f if you're writing the output through redirection (>) as that form of redirection overwrites the target by default.
  • You can just call find against multiple files once by making a pattern as it's more efficient. You can check how I made the filter and used -iregex. Probably doing \( -iname one_ext_pat -or -iname another_ext_pat \) can also be applicable but it's more difficult.
  • exec is optional to prevent unnecessary use of another process.
  • Always prefer [[ ]] over [ ].
  • 4< opens input with file descriptor 4 and -u 4 makes read read from that file descriptor, not stdin (0).
  • What you probably need is -ge MIN_SIZE (greater than or equal) not -gt.

Come to think of it, readarray is a cleaner option if your bash is version 4.0 or newer:

for DIR in "${DIRECTORIES[@]}"; do
    readarray -t FILES < <(exec find "$DIR" -mindepth 1 -type f -regextype egrep -iregex "$FILTER")
    for FILE in "${FILES[@]}"; do
        ...
    done
done

10 Comments

Might NUL-delimit output from the find on FD 4.
@CharlesDuffy How's that possible?
If you mean about using -d $'\0' and -print0: I skipped the idea since it just makes things complicated. It's very rare for filenames to have newlines. Besides we already filter them.
I don't much buy "rare" as an excuse -- the last time I saw multiple TB of logs deleted by accident, it was triggered by a bug causing a buffer of random data to be used as a filename -- but already-filtered is a perfectly fine reason. Making that reason explicit has some advantage, though, inasmuch as folks learning from this answer can understand that it's something they'd need to do were that filtering not in place.
@CharlesDuffy Depends on the context.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.