How do I search sequentially for multiple strings in same file?

Question

Problem

Suppose you have a recipe text file called recipes.yml

Margherita:
  cheese
  tomato

Chicken Supreme:
  cheese
  onions
  chicken
  mushrooms

Veggie:
  cheese
  spinach
  sweetcorn
  peppers
  mushrooms
  onions

Potato:
  cheese
  potato
  oregano

Now I would like to find any pizza that contains either cheese, onion or rucola. I will put my search terms into another file

$ cat terms.txt
cheese
onion
rucola

Desired output

$ while read -r line; do echo "searching pizza containing: $line" && SEARCH $line IN recipes.yml; done <terms.txt
searching pizza containing: cheese
found 4
  Margherita
  Chicken Supreme
  Veggie
  Potato
searching pizza containing: onion
found 2
  Chicken Supreme
  Veggie
searching pizza containing: rucola
found 0

Maybe this is too much to do in bash but I would really like to know if it is possible at all. I am stuck right now. I cant seem to find a way to capture the name of the pizza given the ingredient is found. Here are some half-way attempts using grep, awk and sed:

Attempts

I have only been able to find commands to let me find the number of occurrences of each search term and on what line the match is located in the file. Like this:

$ while read -r "line"; do echo "searching pizza containing: $line" && grep -c "$line" recipes.yml && grep -n "$line" recipes.yml; done <terms.txt
searching pizza containing: cheese
4
2:  cheese
6:  cheese
12:  cheese
20:  cheese
searching pizza containing: onion
2
7:  onions
17:  onions
searching pizza containing: rucola
0

and with awk and sed

$ while read -r "line"; do echo "searching pizza containing: $line" && awk -v avar="$line" '$0 ~ avar {count++} END {print count}' recipes.yml && sed -n "/$line/p" recipes.yml; done <terms.txt
searching pizza containing: cheese
4
  cheese
  cheese
  cheese
  cheese
searching pizza containing: onion
2
  onions
  onions
searching pizza containing: rucola

You should include substrings like grape as a search term with grapefruit in the recipe to make sure there aren't false matches when testing. — Ed Morton
– Ed Morton, Commented Nov 19, 2020 at 23:48

David C. Rankin · Accepted Answer · 2020-11-20 02:00:55Z

First, you would never produce the output shown with "onion" in your terms.txt and "onions" in recipes.yml. (took more than a minute to sort that typo out).

Rule 1, always defer to @EdMorton for the most efficient and validated scripts. That said, a more procedural approach may help what is happening sink in a bit. The awk script below has four rules. The first guarded by NR == FNR && NF simple ensures that rule it applied to the first file only and only to a non-blank line. The second guarded by $0 ~ /:$/ ensures the current record ends in ':'. The third rule applies to all other non-blank lines in the second file. Finally the END rule just prints the results.

awk '
    { $1 = $1 }                         # recalculate records to remove whitespace
    NR == FNR && NF {                   # first file and non-blank line
        a[++n] = $0                     # add term to indexed a[]
        next                            # skip to next record
    }
    $0 ~ /:$/ {                         # second file and line ends in ':'
        pizza = $0                      # set pizza name
        next                            # skip to next record
    }
    NF {                                # second file and non-blank line
        for (i=1; i<=n; i++) {          # loop over a[] array check against terms
            if ($0 == a[i]) {           # if line matches term
                found[$0]++             # increment the found count 
                c[$0] = c[$0]pizza"\n"  # concatenate pizza to c[] capture array
            }
        }
    }
    END {                               # end rule
        for (i=1; i<=n; i++) {          # loop over terms, output count and pizzas 
            printf "searching pizza containing: %s\nfound %d\n", a[i], found[a[i]]
            printf "%s", c[a[i]]
        }
    }
' terms.txt recipes.yml

Example Use/Output

With your data in terms.txt and pizzas.txt, you can simply select copy and middle-mouse paste into an xterm with the files in the current directory to test, e.g.

$ awk '
>     { $1 = $1 }                         # recalculate records to remove whitespace
>     NR == FNR && NF {                   # first file and non-blank line
>         a[++n] = $0                     # add term to indexed a[]
>         next                            # skip to next record
>     }
>     $0 ~ /:$/ {                         # second file and line ends in ':'
>         pizza = $0                      # set pizza name
>         next                            # skip to next record
>     }
>     NF {                                # second file and non-blank line
>         for (i=1; i<=n; i++) {          # loop over a[] array check against terms
>             if ($0 == a[i]) {           # if line matches term
>                 found[$0]++             # increment the found count
>                 c[$0] = c[$0]pizza"\n"  # concatenate pizza to c[] capture array
>             }
>         }
>     }
>     END {                               # end rule
>         for (i=1; i<=n; i++) {          # loop over terms, output count and pizzas
>             printf "searching pizza containing: %s\nfound %d\n", a[i], found[a[i]]
>             printf "%s", c[a[i]]
>         }
>     }
> ' terms.txt recipes.yml
searching pizza containing: cheese
found 4
Margherita:
Chicken Supreme:
Veggie:
Potato:
searching pizza containing: onions
found 2
Chicken Supreme:
Veggie:
searching pizza containing: rucola
found 0

Let em know if you have further questions, and compare the efficiencies @EdMorton incorporated.

Sorry for the typo with onion vs onions. Wow! I did not know it was possible to do all of this in awk alone. There are many things in this script that are new to me, it will take some time for me to fully understand it. Can you recommend any books on awk for beginners?
Believe it or not, the GNU Awk User's Guide is a very good intro and complete reference for awk. Even man 1 awk is a convenience reference for all the special variables, etc. Be aware there is GNU awk (gawk - installed on most systems) and POSIX awk -- which provides a few less features but is substantially the same. There is also mawk that is now roughly equivalent to gawk. Differences noted in the GNU Awk User's Guide.

Ed Morton · Accepted Answer · 2020-11-19 23:46:57Z

2

$ cat tst.awk
NR==FNR {
    count[$1] = 0
    next
}
/^[^[:space:]]/ {
    sub(/:.*/,"")
    type = $0
    next
}
$1 in count || ( sub(/s$/,"",$1) && ($1 in count) ) {
    types[$1] = (count[$1]++ ? types[$1] ORS : "") "  " type
}
END {
    for (term in count) {
        print "searching pizza containing:", term
        print "found", count[term]
        if ( count[term] != 0 ) {
            print types[term]
        }
    }
}

$ awk -f tst.awk terms.txt recipes.yml
searching pizza containing: rucola
found 0
searching pizza containing: cheese
found 4
  Margherita
  Chicken Supreme
  Veggie
  Potato
searching pizza containing: onion
found 2
  Chicken Supreme
  Veggie

answered Nov 19, 2020 at 23:46

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

3 Comments

David C. Rankin Over a year ago

How long did it take to snap to the "onoin" "onions" mismatch.... :) (no doubt shorter than I)

Ed Morton Over a year ago

@DavidC.Rankin First time I tested it and got almost no output :-).

David C. Rankin Over a year ago

(I tested about 20 times, going over "that should work" like I was chasing my tail -- then I just added a print statement before the if ($0 == a[i]) -- and the "smacks self" moment occurred :)

Collectives™ on Stack Overflow

How do I search sequentially for multiple strings in same file?

Problem

Desired output

Attempts

2 Answers 2

2 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Problem

Desired output

Attempts

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related