0

I have to parse input webserver log files and they hold information about each request took. I have to get the median out of it. I am thinking to hold all these interval in a array, sort it and return the mid element out of it. As a first step I am trying to collect all interval in an array but it looks like awk has problems with array. Please let me know what is wrong with the script, I am getting error like illegal reference to variable intvArray . Can somebody please check what is the problem with intvArray

the script is as following

#!/bin/bash

rm -rf 0.out 1.out 2.out collection.out parsed.out
scp [email protected]:/opt/tomcat/escr/log/rce_reactive_001.out ./0.out;
scp [email protected]:/opt/tomcat/escr/log/rce_reactive_002.out ./1.out;
scp [email protected]:/opt/tomcat/escr/log/rce_reactive_000.out ./2.out;
scp [email protected]:/opt/tomcat/escr/log/rce_reactive_003.out ./3.out;

cat ./0.out ./1.out 2.out 3.out >> ./collection.out;
grep interval ./collection.out >> ./parsed.out;

sum=0; count=1; intvArray=(0 0);

#awk 'BEGIN {if($12 + 0 == $12){ sum+=$12; count++}} END{  print sum;}' ./parsed.out
#awk 'BEGIN {sum=0; count=0;} {if($12 + 0 == $12){ sum += $12; count++;}} END{print "Count", count, "Average:", sum/count}' ./parsed.out
awk 'BEGIN {sum=0; count=1;intvArray=(0 0);} {if($12 + 0 == $12){ intvArray[count]=$12; count++;}} END{print "Count", count, "Array:", intvArray}' ./parsed.out

#for a in "${intvArray[@]}"; do echo "$a"; done
7
  • I think to get a median (average) it is better to add all your items up, and then divide by how many items you have. Your dataset might have a lot of high values and a few low, so the middle-most value will have a high value instead of what you actually want. Commented Apr 2, 2014 at 14:58
  • The same goes for when you have a lot of small values and a few high ones, then the middle value will be small. ;) Commented Apr 2, 2014 at 14:59
  • 1
    awk arrays are not shell arrays, your intvArray=(0 0) syntax is for a shell array not an awk array. Commented Apr 2, 2014 at 15:06
  • 2
    awk doesn't do array assignments like bash with arr=(0 0) as you have. you need to include keys for your elements, or let the split function make an array from real data for your, i.e. str="1;2;3"; split(str,arr,";"); for (i in arr) print i"=" arr[i]; Good luck. Commented Apr 2, 2014 at 15:08
  • 1
    wrt getting error like illegal reference to variable intvArray - why not post the actual error message instead of something "like" the error message so we stand the best chance of figuring out what the error message means and being able to help you? More importantly, if you post some sample input and expected output and tell us what you're trying to do we can help you write a script to do that. Commented Apr 2, 2014 at 17:26

2 Answers 2

2

You can do this entirely without temp files:

{
    ssh [email protected] cat /opt/tomcat/escr/log/rce_reactive_001.out
    ssh [email protected] cat /opt/tomcat/escr/log/rce_reactive_002.out
    ssh [email protected] cat /opt/tomcat/escr/log/rce_reactive_000.out
    ssh [email protected] cat /opt/tomcat/escr/log/rce_reactive_003.out
} |
awk '
    /interval/ && $12 == $12 + 0 {intvArray[count++] = $12} 
    END {
        print "Count", count, "Array:"
        for (idx=0; idx<count; idx++) print idx, intvArray[idx]
    }
'

Now, if you want that awk array in a bash array:

intvArray=( $(
    {   ssh [email protected] cat /opt/tomcat/escr/log/rce_reactive_001.out
        ssh [email protected] cat /opt/tomcat/escr/log/rce_reactive_002.out
        ssh [email protected] cat /opt/tomcat/escr/log/rce_reactive_000.out
        ssh [email protected] cat /opt/tomcat/escr/log/rce_reactive_003.out
    } | awk '/interval/ && $12 == $12 + 0 {print $12}'
) )
Sign up to request clarification or add additional context in comments.

Comments

2

A few simplifications to your code - without having seen your inputs:

#!/bin/bash

rm -rf ?.out collection.out parsed.out

scp [email protected]:/opt/tomcat/escr/log/rce_reactive_001.out 0.out
scp [email protected]:/opt/tomcat/escr/log/rce_reactive_002.out 1.out
scp [email protected]:/opt/tomcat/escr/log/rce_reactive_000.out 2.out
scp [email protected]:/opt/tomcat/escr/log/rce_reactive_003.out 3.out

cat {0..3}.out | grep interval > parsed.out

awk 'BEGIN {sum=0; count=0;} {if($12 + 0 == $12){ sum += $12; count++;}} END{print "Count", count, "Average:", sum/count}' parsed.out

awk '{if($12 + 0 == $12)iv[++count]=$12} END{print "Count", count;for(i in iv) print "iv[",i,"] ",iv[i]}' parsed.out

Thanks to Ed Morton for the simplifications and improvements he suggests below. I have added them in the main body of my answer here so all can see them easily and nicely formatted:

awk '$12 + 0 == $12{sum+=$12;count++} END{print "Count",count,"Average:", sum/count}' parsed.out

and also

awk '$12 + 0 == $12{iv[++count]=$12} END{print "Count", count;for(i in iv) printf "iv[%d] %d\n",i,iv[i]}' parsed.out

1 Comment

The first awk script can be reduced to awk '$12 + 0 == $12{sum += $12; count++} END{print "Count", count, "Average:", sum/count}' parsed.out but you should really do something to handle count==0 in the END section. The second can be awk '$12 + 0 == $12{iv[++count]=$12} END{print "Count", count;for(i in iv) printf "iv[%d] %d\n",i,iv[i]}' parsed.out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.