3

I have a bash variable with value something like this:

10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

There are no spaces within value. This value can be very long or very short. Here pairs such as 65:3.0 exist. I know the value of a number from the first part of pair, say 65. I want to extract the number 3.0 or pair 65:3.0. I am not aware of the position (offset) of 65.

I will be grateful for a bash-script that can do such extraction. Thanks.

3
  • Can 65 appear more than once in the value? Could there also be (e.g. 165:4.0)? Commented Dec 20, 2014 at 8:26
  • 1
    How big is 'very long'? Megabytes, gigabytes, bigger? Is it all on one line? Is that format renegotiable? Are the keys always in sorted order (as they are in the example)? Commented Dec 20, 2014 at 8:50
  • It is around 10000 key-value pairs. Commented Dec 20, 2014 at 10:16

7 Answers 7

5

Probably awk is the most straight-forward approach:

awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
3.0

Or to get the pair:

$ awk -F: -v RS=',' '$1==65' <<< "$var"
65:3.0
Sign up to request clarification or add additional context in comments.

11 Comments

Yup, that's how to do it. +1.
Beware that if 65:3.0 is the last field, you'll also get the trailing newline printed.
@gniourf_gniourf Yes RS is a regex (at least in GNU awk)
Actually you could force a trailing comma: <<< "$var,". :).
It's probably more efficient but IMHO it's just briefer, clearer, and more easily extensible: [abcd] vs a|b|c|d. It's what character lists exist to do.
|
4

Here's a pure Bash solution:

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

while read -r -d, i; do
    [[ $i = 65:* ]] || continue
    echo "$i"
done <<< "$var,"

You may use break after echo "$i" if there's only one 65:... in var, or if you only want the first one.

To get the value 3.0: echo "${i#*:}".


Other (pure Bash) approach, without parsing the string explicitly. I'm assuming you're only looking for the first 65 in the string, and that it is present in the string:

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

value=${var#*,65:}
value=${value%%,*}
echo "$value"

This will be very slow for long strings!


Same as above, but will output all the values corresponding to 65 (or none if there are none):

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

tmpvar=,$var
while [[ $tmpvar = *,65:* ]]; do
    tmpvar=${tmpvar#*,65:}
    echo "${tmpvar%%,*}"
done

Same thing, this will be slow for long strings!


The fastest I can obtain in pure Bash is my original answer (and it's fine with 10000 fields):

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

IFS=, read -ra ary <<< "$var"
for i in "${ary[@]}"; do
    [[ $i = 65:* ]] || continue
    echo "$i"
done

In fact, no, the fastest I can obtain in pure Bash is with this regex:

var=10:3.0,16:4.0,32:4.0,39:2.0,65:3.0,95:4.0,110:4.0,111:4.0,2312:1.0

[[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"

Test of this vs awk,

  • where the 65:3.0 is at the end:

    printf -v var '%s:3.0,' {100..11000}
    var+=65:42.0
    time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
    

    shows 0m0.020s (rough average) whereas:

    time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
    

    shows 0m0.008s (rough average too).

  • where the 65:3.0 is not at the end:

    printf -v var '%s:3.0,' {1..10000}
    time awk -F: -v RS=',' '$1==65{print $2}' <<< "$var"
    

    shows 0m0.020s (rough average) and with early exit:

    time awk -F: -v RS=',' '$1==65{print $2;exit}' <<< "$var"
    

    shows 0m0.010s (rough average) whereas:

    time { [[ ,$var, =~ ,65:([^,]+), ]] && echo "${BASH_REMATCH[1]}"; }
    

    shows 0m0.002s (rough average).

17 Comments

This is probably the best solution for small strings, as it avoids spawning a new process like the awk/sed answers do. However, the OP mentions that the string could be very long, so storing it all in an array could lead to unnecessary memory usage.
@user000001 We don't have the OP's definition of very long :). Sure, for several megabytes of data this is not the best approach. But the string is already stored in a Bash variable, so I think it's going to be ok.
I think it is an excellent solution. The maximum length of string may be 10000 key-value pairs. I will try.
@user000001 this won't make a difference when the 65 is at the end of the string. If it's at the front, it's still slower.
OK, good to know that a regexp match against a shell variable in shell is faster than a regexp match against a shell variable in awk but a loop in shell is slower than a loop in awk and that the fastest shell solution ran in 2ms while the fastest awk solution ran in 10ms so both run in the blink of an eye. Thanks for testing.
|
3

With grep:

grep -o '\b65\b[^,]*' <<<"$var"
65:3.0

Or

grep -oP '\b65\b:\K[^,]*' <<<"$var"
3.0

\K option ignores everything before matched pattern and ignore pattern itself. It's Perl-compatibility(-P) for grep command .

Comments

3

Here is an gnu awk

awk -vRS="(^|,)65:" -F, 'NR>1{print $1}' <<< "$var"
3.0

2 Comments

Try this awk -vRS='^65:|,65:' -F, '$1!~/:/{print $1}' <<< "$var"
No need for repetition of the number: RS='(^|,)65:'.
3

try

echo $var | tr , '\n' | awk '/65/' 

where

  • tr , '\n' turn comma to new line
  • awk '/65/' pick the line with 65

or

echo $var | tr , '\n' | awk -F: '$1 == 65 {print $2}' 

where

  • -F: use : as separator
  • $1 == 65 pick line with 65 as first field
  • { print $2} print second field

4 Comments

You can also avoid the tr call by setting the record separator with the RS variable.
@user000001 Is that standard awk feature ? I didn't know about it.
Yea, it works everywhere I have tried. Note that if you want a multi-character record separator you need GNU awk though.
the first solution would fail if a number like 165 appeared before 65 on the line. The second one is the same as posted by @user000001 in approach but less efficient.
2

Using sed

sed -e 's/^.*,\(65:[0-9.]*\),.*$/\1/' <<<",$var,"

output:

65:3.0

There are two different ways to protect against 65:3.0 being the first-in-line or last-in-line. Above, commas are added to surround the variable providing for an occurrence regardless. Below, the Gnu extension \? is used to specify zero-or-one occurrence.

sed -e 's/^.*,\?\(65:[0-9.]*\),\?.*$/\1/' <<<$var

Both handle 65:3.0 regardless of where it appears in the string.

Comments

1

Try egrep like below:

echo $myvar | egrep -o '\b65:[0-9]+.[0-9]+' | 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.