1

I have spent hours on Awk tutorials but I can not get around that one: I want to use a variable as a regex for a awk query. Here is an example of what i want to achieve:

#!/bin/bash
#My test array
testarray=(teststring[1078] teststringthatshouldnotmatch teststring[5845])

#myregex as a variable
regex="teststring\[.*"

#the awk
for value in ${testarray[*]}
do
echo ${value} | awk '{if ($1 ~ regex) print}'
done

I woud expect Awk to match teststring 1 and 3 but it matches all. Thanks for any light on this one.

3 Answers 3

2

When using a string in a regexp context you need to escape twice anything you want escaped. Always quote your shell variables, and there's no need to call match(), and you should put the condition inthe condition section of the awk script, not inside an if in the action part, and there's no need for an explicit print. Also, .* means zero or more repetitions of any char and so matches zero chars and so is doing nothing useful for your regexp. All you need is:

regex='teststring\\['
...
awk -v test="$regex" '$1~test'

Look:

$ cat tst.sh
#!/bin/bash
#My test array
testarray=(teststring[1078] teststringthatshouldnotmatch teststring[5845])

#myregex as a variable
regex='teststring\\['

#the awk
for value in "${testarray[@]}"
do
    echo "$value" | awk -v test="$regex" '$1 ~ test'
done
$
$ ./tst.sh
teststring[1078]
teststring[5845]
Sign up to request clarification or add additional context in comments.

Comments

1

The answer to the seemingly strange behavior of awk is quite simple.

Shell variables are not awk variables.

While the shell variable regex holds the string you assigned to it, the awk variable regex is still the empty string, which matches any string.

Shell variables are accessible via the ENVIRON hash in awk.

Using this approach don't forget that as for any process started from the shell only exported shell variables will be copied in the environment of the child process.

So don't forget to export any variables you want to access via ENVIRON.

To make your script work change $1 ~ regex to $1 ~ ENVIRON["regex"].

You may also assign the shell variable regex to the awk variable regex on the command line using the -v switch. In this case you will have to escape shell metacharacters, so maybe the above mentioned solution is the more elagant one.

4 Comments

wrt In this case you will have to escape shell metacharacters, so probably the above mentioned solution is the more elagant one. - no that's not true, just quote your shell variables like you always should and you don't need to escape the globbing chars etc.. If you're going to talk about ENVIRON you should mention that the only shell variables it has access to are those exported previously or set on the command line.
thanks for the hint, I added a reminder only exported variables will be copied to child processes. beauty lies in the eyes of the observer, I tried to weaken
... replacing probably by maybe.
read regex instead of a fixed assignment will work without having to remeber how many times to quote things, which is the reason I'd still prefer the first version. but of course this is just a matter of taste.
0

I found a way in the end: Awk should be written like this to allow for a variable to be used (need to re-declare the variable with -v)

awk -v test=$regex '{if (match($1, test)) {print}}'

Maybe there is a better way but this one does the trick :)

EDIT AFTER SEEING THE ANSWERS: Thanks, I will update my code.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.