0

I have a string as below :

30750 [uber-SubtaskRunner] INFO  org.apache.hadoop.hive.ql.exec.Task  - Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1

Now I want to extract the numbers from it and add them up using shell script. Basically I want to get the sum of number of mappers and reducers. Splitting the string based on 'space character' does seem to be working for me, any regex pattern will do the stuff.

Thanks

1
  • Please edit your Q to show your expected output. Good luck. Commented Aug 11, 2016 at 13:13

1 Answer 1

1

You can do it with a Perl one-liner:

perl -ne '$s+=$1 foreach /number of .*?: (\d+)/g; print $s'

Demo: https://ideone.com/8ghKE5


An awk version:

awk '{while(match($0,"number of [^:]+: ([[:digit:]]+)",a)){s+=a[1];$0=substr($0,RSTART+RLENGTH)}}END{print s}'

Demo: https://ideone.com/Hbccm9

Explanation:

  • The while() loop sums up all numbers into variable s extracted with the help of the regex in match().
    • In the loop condition:
      • The match() function tries to find the pattern number of [^:]+: ([[:digit:]]+) in the current input string ($0) and stores capture groups (subpatterns in parenthesis - ([[:digit:]]+) in our case) in the array a.
      • The regex number of [^:]+: ([[:digit:]]+) matches substring "number of <something not containing ':'>: <sequence of digits>" and captures the <sequence of digits> (which is effectively a number we're looking for) into the capture group one.
    • In the loop body:
      • s+=a[1] adds to s the number which was captured in the group one by the regex in match()
      • $0=substr($0,RSTART+RLENGTH) removes from the input string $0 everything up to (and including) substring matched the pattern in the match() so that this match() would lookup further on the next iteration.
  • The finalization block (END{...}) just prints the sum collected in s.
Sign up to request clarification or add additional context in comments.

10 Comments

Thanks Dmitry, can you explain the thing that you have shared for 'awk'
@MohitRane, You're welcome! I've added explanation to the answer.
Can you please help with the regex pattern to get 'date and time ' out of below string. '30807 [uber-SubtaskRunner] INFO org.apache.hadoop.hive.ql.exec.Task - 2016-08-11 06:41:08,318 Stage-2 map = 0%, reduce = 0%' Output should contain : 2016-08-11 06:41:08 I just want to get the time when the mapper and reducer are launched. Thanks
Its in pearl, I need it to be in shell script.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.