2

Using Shell Script (Bash), I am trying to sum the columns for all the different variables of a list. Suppose I have the following input of a Test.tsv file

           Win  Lost
Anna        1   1 
Charlotte   3   1
Lauren      5   5
Lauren      6   3
Charlotte   3   2
Charlotte   4   5
Charlotte   2   5
Anna        6   4
Charlotte   2   3
Lauren      3   6
Anna        1   2
Anna        6   2
Lauren      2   1
Lauren      5   5
Lauren      6   6
Charlotte   1   3
Anna        1   4

And I want to sum up how much each of the participants have won and lost. So I want to get this as a result:

          Sum Win    Sum Lost
Anna        57         58
Charlotte   56         57
Lauren      53         56

What I would usually do is take the sum per person and per column and repeat that process over and over. See below how I would do it for the example mentioned:

cat Test.tsv | grep -Pi '\bAnna\b' | cut -f2 -d$'\t' |paste -sd+ | bc > Output.tsv
cat Test.tsv | grep -Pi '\bCharlotte\b' | cut -f2 -d$'\t' |paste -sd+ | bc >> Output.tsv
cat Test.tsv | grep -Pi '\bLauren\b' | cut -f2 -d$'\t' |paste -sd+ | bc >> Output.tsv
cat Test.tsv | grep -Pi '\bAnna\b' | cut -f3 -d$'\t' |paste -sd+ | bc > Output.tsv
cat Test.tsv | grep -Pi '\bCharlotte\b' | cut -f3 -d$'\t' |paste -sd+ | bc >> Output.tsv
cat Test.tsv | grep -Pi '\bLauren\b' | cut -f3 -d$'\t' |paste -sd+ | bc >> Output.tsv

However I would need to repeat this line for every participant. This becomes a pain when you have to many variables you want to sum it up for.

What would be the way to write this script?

Thanks!

1 Answer 1

6

This is pretty straightforward with awk. Using GNU awk:

 awk -F '\t' 'BEGIN { OFS = FS } NR > 1 { won[$1] += $2; lost[$1] += $3 } END { PROCINFO["sorted_in"] = "@ind_str_asc"; print "", "Sum Win", "Sum Lost"; for(p in won) print p, won[p], lost[p] }' filename

-F '\t' makes awk split lines at tabs, then:

BEGIN { OFS = FS }  # the output should be separated the same way as the input

NR > 1 {            # From the second line forward (skip header)
  won[$1] += $2     # tally up totals
  lost[$1] += $3
}

END {               # When done, print the lot.

  # GNU-specific: Sorted traversal or player names
  PROCINFO["sorted_in"] = "@ind_str_asc"

  print "", "Sum Win", "Sum Lost"
  for(p in won) print p, won[p], lost[p]
}
Sign up to request clarification or add additional context in comments.

2 Comments

Would it not be easier to just use asort ?
You could use asorti, but since that's also GNU-specific it doesn't make much of a difference. Without GNU awk, I'd probably leave the header out of the awk output, print unsorted, pipe through sort and add the header afterwards.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.