Summing massive array in bash script not working

Question

I have been having issues summing a very big array (millions of numbers), and was trying to sum all the values inside but it keeps failing (giving me 0 from the initial component). Below is my code:

Map.sh

#/bin/bash

file="myfile.csv"
data=`tail -n +2 $file |  cut -d"," -f 4`
data1=()
for i in $data;
do
data1+=($i)
done;
count=${#data1[@]}
export count
export data1
export data
./reduce.sh

reduce.sh

#/bin/bash
echo $count
sum=0
for i in "${data1[@]}"; do
        sum = $((sum + $i))
done;
echo $sum

I have tried almost every single variable I have found online but none works. Am I missing something?

data example: I am looking at this column (4):

and it extends by millions.

@Daniel : Did you verify the content of your data1 array? BTW, what is the purpose of turning your shell variables into environment variables? Aside from the fact that a bash array can not be exported, you don't have any child process which would benefit from the export. — user1934428
– user1934428, Commented Mar 18, 2020 at 15:26
I do get this message when I try to get the count in the reduce script: Argument list too long. So I guess that is the issue. Can you think of any solutions? — Daniel
– Daniel, Commented Mar 18, 2020 at 15:34
Side note: sum = $((sum + $i)) is wrong (blanks around =); shellcheck.net tells you things like that. — Benjamin W.
– Benjamin W., Commented Mar 18, 2020 at 16:07
Also protect base10 against base 8 interpretation if values contains leading zeros: sum=$((sum+10#i)) or with bash: sum+=$((10#i)). Anyway using a shell to iterate over a large data set is not appropriate. read -r sum < <(IFS='+'; printf '%s\n' "${data1[*]}" | bc -l) or read -r sum < <(tail -n +2 "$file" | cut -d ',' -f 4 | tr '[:space:]' '+' | bc -l) — Léa Gris
– Léa Gris, Commented Mar 18, 2020 at 16:53

Freddy · Accepted Answer · 2020-03-18 15:32:34Z

2

With GNU datamash:

datamash --header-in -t',' sum 4 < myfile.csv

This builds the sum of the values of the fourth field of the comma separated input file. The header line is skipped.

answered Mar 18, 2020 at 15:32

Freddy

4,7481 gold badge9 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

James Brown · Accepted Answer · 2020-03-18 15:41:49Z

2

Would this awk work for you:

$ awk -F, '       # comma delimiter
FNR>1 {           # skip header record
    sum+=$4       # sum 4th field values to sum var
}
END {             # in the end
    print sum     # output the sum
}' file

edited Mar 18, 2020 at 15:41

answered Mar 18, 2020 at 15:17

James Brown

37.7k8 gold badges52 silver badges64 bronze badges

5 Comments

Léa Gris Over a year ago

maybe add skip of header line

Daniel Over a year ago

./reduce.sh: Argument list too long. Plus i have to use the array data1, this is following structure MapReduce where the key and separate number of numbers are taken and stored in the first scrip, and then carried over and summed in the second one.

rici Over a year ago

@Daniel: If you need the count as well as the sum, change print sum to print NR-1, sum. Or, if you need them on separate lines, use two print statements. JamesBrown's awk script replaces your reduce.sh.

rici Over a year ago

The limitation of awk is that it uses double-precision floating point arithmetic, so the results are only precise up to 2**53. I think you should be OK here, since it seems like your sum is in the millions or maybe billions, but not yet quadrillions. But it's a serious limitation in a MapReduce environment.

David C. Rankin Over a year ago

But so long as you are summing small whole-numbers, you shouldn't have any rounding issue, and a benefit of the awk solution is that it will be Orders-Of-Magnitude faster than a shell script solution.

Collectives™ on Stack Overflow

Summing massive array in bash script not working

2 Answers 2

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related