How to Extract Parts of String in Shell Script into Variables

Question

I am trying to do the following in sh.

Here's my file:

foo
bar
Tests run: 729, Failures: 0, Errors: 253, Skipped: 0
baz

How can I pull the 4 numbers into 4 different variables? I've spent about an hour now on sed and awk man pages and I'm spinning my wheels.

What's the precise format of the file? Is there a single significant row in the file? Are the numbers always in-order? — Trevor
– Trevor, Commented Jul 29, 2015 at 19:44
The file will be variable but the row with Tests will always be there; thats the significant row. The numbers in that row will always be there, in that format. — Roy Truelove
– Roy Truelove, Commented Jul 29, 2015 at 19:48
It would be much more efficient to use a more capable shell with built-in regex support rather than /bin/sh and needing to use external tools for the extraction. I mean, yes, this can be done in pure POSIX sh, but you're going to be taking a performance hit for the startup time for awk/sed/whatnot. — Charles Duffy
– Charles Duffy, Commented Jul 29, 2015 at 19:57

Charles Duffy · Accepted Answer · 2015-07-29 21:08:11Z

Adopting my prior answer to use the heredoc approach suggested by @chepner:

read run failures errors skipped <<EOF
$(grep -E '^Tests run: ' <file.in | tr -d -C '[:digit:][:space:]')
EOF

echo "Tests run: $run"
echo "Failures: $failures"
echo "Errors: $errors"
echo "Skipped: $skipped"

Alternately (put this into a shell function to avoid overriding "$@" for the duration of the script):

unset IFS # assert default values
set -- $(grep -E '^Tests run: ' <in.file | tr -d -C '[:digit:][:space:]')
run=$1; failures=$2; errors=$3; skipped=$4

Note that this is only safe because no glob characters can be present in the output of tr when run in this way; set -- $(something) usually a practice better avoided.

Now, if you were writing for bash rather than POSIX sh, you could perform regex matching internal to the shell (assuming in the below that your input file is relatively short):

#!/bin/bash
re='Tests run: ([[:digit:]]+), Failures: ([[:digit:]]+), Errors: ([[:digit:]]+), Skipped: ([[:digit:]]+)'
while IFS= read -r line; do
  if [[ $line =~ $re ]]; then
    run=${BASH_REMATCH[1]}
    failed=${BASH_REMATCH[2]}
    errors=${BASH_REMATCH[3]}
    skipped=${BASH_REMATCH[4]}
  fi
done <file.in

If your input file is not short, it may be more efficient to have it pre-filtered by grep, thus changing the last line to:

done < <(egrep -E '^Tests run: ' <file.in)

chepner · Accepted Answer · 2015-07-29 20:25:20Z

1

Given the format of the input file, you can capture the output of grep in a here document, then split it with read into four parts to be post-processed.

IFS=, read part1 part2 part3 part4 <<EOF
$(grep '^Tests run' input.txt)
EOF

Then just strip the unwanted prefix from each part.

run=${part1#*: }
failures=${part2#*: }
errors=${part3#*: }
skipped=${part4#*: }

edited Jul 29, 2015 at 20:25

answered Jul 29, 2015 at 20:22

chepner

538k77 gold badges594 silver badges746 bronze badges

1 Comment

Charles Duffy Over a year ago

Nice. Very nice. I need to start reaching for that technique more often.

Thomasleveil · Accepted Answer · 2015-07-29 20:12:21Z

0

assuming there is only one line starting with Tests run: in your file, and that the file is named foo.txt, the following command will create 4 shell variable that you can work with:

eval $(awk 'BEGIN{ FS="(: |,)" }; /^Tests run/{ print "TOTAL=" $2 "\nFAIL=" $4 "\nERROR=" $6 "\nSKIP=" $8 }' foo.txt); echo $TOTAL; echo $SKIP; echo $ERROR; echo $FAIL

echo $TOTAL; echo $SKIP; echo $ERROR; echo $FAIL is just to demonstrate that the environment variable exists and can be used.

The awk script in a more readable manner is:

BEGIN { FS = "(: |,)" }

/^Tests run/ {
    print "TOTAL=" $2 "\nFAIL=" $4 "\nERROR=" $6 "\nSKIP=" $8
}

FS = "(: |,)" tells awk to consider ":" or "," as field separators.

Then the eval command will read as a command the result of the awk script and as such create the 4 environment variables.

NOTE: due to the use of eval, you must trust the content of the foo.txt file as one could forge a line starting with Tests run: which could have commands thereafter.

You could improve that bit by having a more restrictive regex in the awk script: /^Tests run: \d+, Failures: \d+, Errors: \d+, Skipped: \d+$/

The full command would then be:

eval $(awk 'BEGIN{ FS="(: |,)" }; /^Tests run: \d+, Failures: \d+, Errors: \d+, Skipped: \d+$/{ print "TOTAL=" $2 "\nFAIL=" $4 "\nERROR=" $6 "\nSKIP=" $8 }' foo.txt); echo $TOTAL; echo $SKIP; echo $ERROR; echo $FAIL

edited Jul 29, 2015 at 20:12

answered Jul 29, 2015 at 20:05

Thomasleveil

105k18 gold badges123 silver badges119 bronze badges

3 Comments

Charles Duffy Over a year ago

If you're going to use eval here, I'd suggest (strongly!) filtering for only numeric values, so an attacker can't insert code into your repository that puts Tests run: $(rm -rf /) -- or something that downloads and runs shellcode -- into your test suite's output. Being able to do a privilege escalation from checking hostile code into a git repo to running code inside live infrastructure is not a Good Thing.

Thomasleveil Over a year ago

you are right, eval can be evil, I will update my answer with a warning

Charles Duffy Over a year ago

(on a different point, all-caps variable names are bad practice; see fourth paragraph of pubs.opengroup.org/onlinepubs/009695399/basedefs/…, keeping in mind that shell variables and environment variables share a namespace).

Trevor · Accepted Answer · 2015-07-29 20:03:23Z

-1

There are shorter versions, but this one "shows" each step.

#!/bin/bash
declare -a arr=`grep 'Tests ' a | awk -F',' '{print $1 "\n" $2 "\n" $3 "\n" $4}' | sed 's/ //g' | awk -F':' '{print $2}'`
echo $arr
for var in $arr
do
    echo $var
done

answered Jul 29, 2015 at 20:03

Trevor

1,9984 gold badges22 silver badges28 bronze badges

3 Comments

Charles Duffy Over a year ago

declare -a is not available in POSIX sh (as arrays aren't supported there in general).

Charles Duffy Over a year ago

...also, declare -a arr=$(...) only assigns any value to the first element of arr; it would need to be declare -a arr=( $(...) ) to assign to multiple elements (using string-splitting and glob expansion to get to those elements from the single string received from the expansion -- rarely a desirable practice); alternately, with bash 4.x, readarray and mapfile are available to populate an array directly.

Roy Truelove Over a year ago

Also! I need to do this in sh, unfortunately

Collectives™ on Stack Overflow

How to Extract Parts of String in Shell Script into Variables

4 Answers 4

Comments

1 Comment

3 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related