Summing sections of a 2d array python

Question

I have a 2d array with dimensions array[x][9]. X because its reading from a file of varying length. I want to find the sum of each column of the array but for 24 columns at a time and input the results into a new array; equivalent to sum(array2[0:24]) but for a 2d array. Is there a special syntax i just dont know about or do i have to do it manually. I know if it was a 1d array i could iterate through it by doing

for x in range(len(array)/24):
     total.append(sum(array2[x1:x24])) # so i get an array of the sums

What is the equivalent for a 2d array and doing it column by column. I can imagine doing it by storing each column in its own separate 1d array and then finding the sums, or a mess of for and while loops. Neither of which sound even slightly elegant.

... sum of each column of the array but for 24 columns at a time ... don't you mean rows? because array[x][9] means there are only 9 columns. — AKS
– AKS, Commented Apr 26, 2016 at 6:40
yes. each column needs to be separate. the output needs to be in the same format as the input just 1/24 as many rows but technically each item being 24 times larger. — Sam
– Sam, Commented Apr 26, 2016 at 6:52
zip(*alst) can be used 'transpose' a list of lists. That may make your 'column' sum easier. If the sublists are all the same length numpy arrays might be more elegant. — hpaulj
– hpaulj, Commented Apr 26, 2016 at 6:54

chthonicdaemon · Accepted Answer · 2016-04-26 08:33:56Z

2

It sounds like you perhaps are working with time series data, with a file containing hourly values and you want a daily sum (hence the 24). The pandas library will do this really nicely:

Suppose you have your data in data.csv:

import pandas
df = pandas.read_csv('data.csv')

If one of your columns was a timestamp, you could use that, but if you only have raw data, you can create a time index:

df.index = pandas.date_range(pandas.datetime.today().date(), 
                             periods=df.shape[0], freq='H')

Now the summing of all columns on daily basis is very easy:

daily = df.resample('D').apply(sum)

answered Apr 26, 2016 at 8:33

chthonicdaemon

19.9k2 gold badges55 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sam Over a year ago

This is exactly it. I can get the timestamp with minimal effort.

Selcuk · Accepted Answer · 2016-04-26 06:52:42Z

2

You can use zip to transpose your array and use a comprehension to sum each column separately:

>>> array = [[1, 2, 3], [10, 20, 30], [100, 200, 300]]
>>> [sum(a) for a in zip(*array)]
[111, 222, 333]

answered Apr 26, 2016 at 6:52

Selcuk

60.1k12 gold badges114 silver badges119 bronze badges

2 Comments

Sam Over a year ago

summing the columns isnt the difficult part its doing 24 at a time that im having a problem with

Byte Commander Over a year ago

@Sam The code in the answer works for any constant number of columns,

AKS · Accepted Answer · 2016-04-26 07:25:41Z

1

Please try this:

x = len(a) # x is the length of a

step = 24

# get the number of iterations you need to do
n = int(math.ceil(float(x) / step))


new_a = [map(lambda k: sum(list(k)), zip(*a[i * step:(i + 1) * step])) 
         for i in range(0, n)]

If x is not a multiple of 24 then the last row in the new_a will have the sum of remainder rows (count of which will be less that 24).

This also assumes that the values in a are numbers so I have not done any conversions.

answered Apr 26, 2016 at 7:25

AKS

20k3 gold badges47 silver badges55 bronze badges

Collectives™ on Stack Overflow

Summing sections of a 2d array python

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related