Is it possible to grep using an array as pattern?

Question

TL;DR How to filter an ls/find output using grep with an array as a pattern?

Background story: I have a pipeline which I have to rerun for datasets which run into an error. Which datasets are run into an error is saved in a tab separated file. I want to delete the files where the pipeline has run into an error.

To do so I extracted the dataset names from another file containing the finished dataset and saved them in a bash array {ds1 ds2 ...} but now I am stuck because I cannot figure out how to exclude the datasets in the array from my deletion step.

This is the folder structure (X=1-30): datasets/dsX/results/dsX.tsv

Not excluding the finished datasets, meaning deleting the folders of the failed and the finished datasets works like a charm

#1. move content to a trash folder
ls /datasets/*/results/*|xargs -I '{}' mv '{}' ./trash/

#2. delete the empty folders
find /datasets/*/. -type d -empty -delete

But since I want to exclude the finished datasets I thought it would be clever to save them in an array:

#find finished datasets by extracting the dataset names from a tab separated log file
mapfile -t -s 1 finished < <(awk '{print $2}' $path/$log_pf)
echo ${finished[@]}

which works as expected but now I am stuck in filtering the ls output using that array: *pseudocode

#trying to ignore the dataset in the array - not working
ls -I${finished[@]} -d /datasets/*/

#trying to reverse grep for the finished datasets - not working
ls /datasets/*/ | grep -v {finished}

What do you think about my current ideas? Is this possible using bash only? I guess in python I could do that easily but for training purposes, I want to do it in bash.

See mywiki.wooledge.org/ParsingLs and mywiki.wooledge.org/Quotes for issues with your code beyond the problem you're asking about. Can your file names contain blank chars? — Ed Morton
– Ed Morton, Commented Jun 14, 2019 at 11:43

choroba · Accepted Answer · 2019-06-13 19:35:48Z

4

grep can get the patterns from a file using the -f option. Note that file names containing newlines will cause problems.

If you need to process the input somehow, you can use process substitution:

grep -f <(process the input...)

edited Jun 13, 2019 at 19:35

answered Jun 13, 2019 at 18:14

choroba

245k27 gold badges221 silver badges304 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ivo Leist Over a year ago

I know but the file is a tab separated file with several columns, therefore I am extracting the dataset names column and save it in an array

glenn jackman Over a year ago

Extending the answer, use -f with a process substitution: grep -f <(printf "%s\n" "${finished[@]}")

Ivo Leist Over a year ago

@glenn jackman thank you for the quick extension of the other comment. Seems to work :) If you want the points you can add it as an extra answer otherwise I would accept the answer of choroba.

Ed Morton · Accepted Answer · 2019-06-14 11:51:13Z

1

I must admit I'm confused about what you're doing but if you're just trying to produce a list of files excluding those stored in column 2 of some other file and your file/directory names can't contain spaces then that'd be:

find /datasets -type f | awk 'NR==FNR{a[$2]; next} !($0 in a)' "$path/$log_pf" -

If that's not all you need then please edit your question to clarify your requirements and add concise testable sample input and expected output.

answered Jun 14, 2019 at 11:51

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

4 Comments

Ivo Leist Over a year ago

Hello Ed, sorry for coming back so late... At first thank you for sharing your wiki this convinced me to switch from parsing "ls" to using "find" :). However, since I in the original question asked for how to grep an array I accepted @choroba

Ivo Leist Over a year ago

Do you mind to elaborate in pseudo code what this AWK function does? I'm still a bash novice

Ed Morton Over a year ago

Sorry, it's been too long so I don't remember what the question was about and don't want to re-learn it. Basically though it's saving some field of a file in an array and then if the output of find does not in the array (i.e. was the 2nd field of that file) then it prints the find output for that line.

Ivo Leist Over a year ago

Fair enough, I figured it out in the meanwhile: if someone needs to understand it is as well look in the link below at the answer of Walter A. he has written a brilliant takedown of this oneliner. stackoverflow.com/questions/32481877/what-is-nr-fnr-in-awk

Collectives™ on Stack Overflow

Is it possible to grep using an array as pattern?

2 Answers 2

3 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related