3

I have a requirement to recursively loop through all the files of certain file types in a directory. The file types is an array variable containing the list of file types that we need to go through for processing. The array values are actually dynamically populated. For the sake of simplicity I am declaring a static array.

 declare -a arr=("pdf" "doc" "txt")

I have the following code to recursively list all the files in the directory, but I am not able to figure out how include the array "arr" to only get back those file types that are included in the array.

find $i -type f -print0 | while read -d $'\0' file; do
    echo $file;
    #Process file
done

Please help me modify the code so that I could retrieve the specified file types only and not all files.

1
  • This question on the Superuser SE site may be of help. Commented Jul 21, 2017 at 23:11

2 Answers 2

5

I assume that by file types "pdf", "doc", "txt", you mean filenames with those extensions.

If the number of file types is reasonably small (less than a few dozen), then you could build an array of arguments to pass to find in the format:

... -name '*.pdf' -o -name '*.doc' -o -name '*.txt' ...

Assuming that the array of file types is not empty, here's one way to do it (thanks @mike-holt):

arr=(pdf doc txt)

findargs=()

for t in "${arr[@]}"; do
    findargs+=(-name "*.$t" -o)
done

find . -type f \( "${findargs[@]}" -false \)
Sign up to request clarification or add additional context in comments.

1 Comment

I like that way too.
4

What you will need to do is dynamically build a regex from your array of types (presuming extensions) to pass to find with the -regex option. find can then use the regex to find filenames that match. Given your example, you are wanting to build a regex similar to:

"^.*[.]\(pdf\|doc\|txt\)$"

In order to build that regex dynamically from the array content, you can do something similar to the following

#!/bin/bash

arr=(pdf doc txt)   ## dynamically built array of extensions
n=${#arr[@]}        ## number of elements in array
regex='^.*[.]\('    ## beginning of regex
srch="${1:-.}"      ## path to search (default '.')

for ((i = 0; i < $n; i++)); do  ## loop over each element
    ## if not last, add "${arr[i]}\|" otherwise add "${arr[i]}\)"
    ((i < n - 1)) && regex="$regex${arr[i]}\|" || regex="$regex${arr[i]}\)"
done

regex="$regex\$"    ## add the final '$'

find "$srch" -type f -regex "$regex"  ## execute the find

(note this uses a bash specific C-style for loop and arrays so it is not POSIX shell portable -- which since you are using an array shouldn't be a problem)

Give it a try and let me know if that meets your needs.

4 Comments

Or you could just do find \( $(printf -- "-name *.%s -o " "${arr[@]}") -false \)
Yes, that looks like an easier approach using -name and -o as @janos answered with. I took the Rube Goldberg regex approach :)
@MikeHolt the * may get expanded by the shell. It's not very safe. But I got a tip from you in -false and greatly improved my answer with it, thanks!
@janos Hmm. Seems you're right. It fails if any of the types in arr are present in the top-level directory. So yeah, you'd either have to construct an array of arguments as in your answer, or use set -o noglob first. I'm just a sucker for one-liners. :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.