0

I have a 2d array with dimensions array[x][9]. X because its reading from a file of varying length. I want to find the sum of each column of the array but for 24 columns at a time and input the results into a new array; equivalent to sum(array2[0:24]) but for a 2d array. Is there a special syntax i just dont know about or do i have to do it manually. I know if it was a 1d array i could iterate through it by doing

for x in range(len(array)/24):
     total.append(sum(array2[x1:x24])) # so i get an array of the sums  

What is the equivalent for a 2d array and doing it column by column. I can imagine doing it by storing each column in its own separate 1d array and then finding the sums, or a mess of for and while loops. Neither of which sound even slightly elegant.

5
  • ... sum of each column of the array but for 24 columns at a time ... don't you mean rows? because array[x][9] means there are only 9 columns. Commented Apr 26, 2016 at 6:40
  • yes rows, my mistake Commented Apr 26, 2016 at 6:44
  • Do you need the sum for each of the 9 columns separately? Commented Apr 26, 2016 at 6:45
  • yes. each column needs to be separate. the output needs to be in the same format as the input just 1/24 as many rows but technically each item being 24 times larger. Commented Apr 26, 2016 at 6:52
  • zip(*alst) can be used 'transpose' a list of lists. That may make your 'column' sum easier. If the sublists are all the same length numpy arrays might be more elegant. Commented Apr 26, 2016 at 6:54

3 Answers 3

2

It sounds like you perhaps are working with time series data, with a file containing hourly values and you want a daily sum (hence the 24). The pandas library will do this really nicely:

Suppose you have your data in data.csv:

import pandas
df = pandas.read_csv('data.csv')

If one of your columns was a timestamp, you could use that, but if you only have raw data, you can create a time index:

df.index = pandas.date_range(pandas.datetime.today().date(), 
                             periods=df.shape[0], freq='H')

Now the summing of all columns on daily basis is very easy:

daily = df.resample('D').apply(sum)
Sign up to request clarification or add additional context in comments.

1 Comment

This is exactly it. I can get the timestamp with minimal effort.
2

You can use zip to transpose your array and use a comprehension to sum each column separately:

>>> array = [[1, 2, 3], [10, 20, 30], [100, 200, 300]]
>>> [sum(a) for a in zip(*array)]
[111, 222, 333]

2 Comments

summing the columns isnt the difficult part its doing 24 at a time that im having a problem with
@Sam The code in the answer works for any constant number of columns,
1

Please try this:

x = len(a) # x is the length of a

step = 24

# get the number of iterations you need to do
n = int(math.ceil(float(x) / step))


new_a = [map(lambda k: sum(list(k)), zip(*a[i * step:(i + 1) * step])) 
         for i in range(0, n)]

If x is not a multiple of 24 then the last row in the new_a will have the sum of remainder rows (count of which will be less that 24).

This also assumes that the values in a are numbers so I have not done any conversions.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.