Numpy - Averaging multiple columns of a 2D array

Question

Right now I am doing this by iterating, but there has to be a way to accomplish this task using numpy functions. My goal is to take a 2D array and average J columns at a time, producing a new array with the same number of rows as the original, but with columns/J columns.

So I want to take this:

J = 2 // two columns averaged at a time

[[1 2 3 4]
 [4 3 7 1]
 [6 2 3 4]
 [3 4 4 1]]

and produce this:

[[1.5 3.5]
 [3.5 4.0]
 [4.0 3.5]
 [3.5 2.5]]

Is there a simple way to accomplish this task? I also need a way such that if I never end up with an unaveraged remainder column. So if, for example, I have an input array with 5 columns and J=2, I would average the first two columns, then the last three columns.

Any help you can provide would be great.

tzelleke · Accepted Answer · 2013-01-18 14:40:23Z

4

data.reshape(-1,j).mean(axis=1).reshape(data.shape[0],-1)

If your j divides data.shape[1], that is.

Example:

In [40]: data
Out[40]: 
array([[7, 9, 7, 2],
       [7, 6, 1, 5],
       [8, 1, 0, 7],
       [8, 3, 3, 2]])

In [41]: data.reshape(-1,j).mean(axis=1).reshape(data.shape[0],-1)
Out[41]: 
array([[ 8. ,  4.5],
       [ 6.5,  3. ],
       [ 4.5,  3.5],
       [ 5.5,  2.5]])

answered Jan 18, 2013 at 14:40

tzelleke

15.4k5 gold badges35 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

acjay · Accepted Answer · 2013-01-18 15:04:37Z

2

First of all, it looks to me like you're not averaging the columns at all, you're just averaging two data points at a time. Seems to me like your best off reshaping the array, so your that you have a Nx2 data structure that you can feed directly to mean. You may have to pad it first if the number of columns isn't quite compatible. Then at the end, just do a weighted average of the padded remainder column and the one before it. Finally reshape back to the shape you want.

To play off of the example provided by TheodrosZelleke:

In [1]: data = np.concatenate((data, np.array([[5, 6, 7, 8]]).T), 1)

In [2]: data
Out[2]: 
array([[7, 9, 7, 2, 5],
       [7, 6, 1, 5, 6],
       [8, 1, 0, 7, 7],
       [8, 3, 3, 2, 8]])

In [3]: cols = data.shape[1]

In [4]: j = 2

In [5]: dataPadded = np.concatenate((data, np.zeros((data.shape[0], j - cols % j))), 1)

In [6]: dataPadded
Out[6]: 
array([[ 7.,  9.,  7.,  2.,  5.,  0.],
       [ 7.,  6.,  1.,  5.,  6.,  0.],
       [ 8.,  1.,  0.,  7.,  7.,  0.],
       [ 8.,  3.,  3.,  2.,  8.,  0.]])

In [7]: dataAvg = dataPadded.reshape((-1,j)).mean(axis=1).reshape((data.shape[0], -1))

In [8]: dataAvg
Out[8]: 
array([[ 8. ,  4.5,  2.5],
       [ 6.5,  3. ,  3. ],
       [ 4.5,  3.5,  3.5],
       [ 5.5,  2.5,  4. ]])

In [9]: if cols % j:
    dataAvg[:, -2] = (dataAvg[:, -2] * j + dataAvg[:, -1] * (cols % j)) / (j + cols % j)
    dataAvg = dataAvg[:, :-1]
   ....:     

In [10]: dataAvg
Out[10]: 
array([[ 8.        ,  3.83333333],
       [ 6.5       ,  3.        ],
       [ 4.5       ,  3.5       ],
       [ 5.5       ,  3.        ]])

edited Jan 18, 2013 at 15:04

answered Jan 18, 2013 at 14:40

acjay

37.2k6 gold badges60 silver badges104 bronze badges

2 Comments

user1764386 Over a year ago

Would you mind elaborating on what is being accomplished by the statements in line 9? That part is tripping me up.

acjay Over a year ago

Yeah, that's the weighted average part. Since you specified above that you want remainder columns to be tacked on to the last column, the last line folds in the remainder, if there is one (if so, cols % j is nonzero, look up modulus operator if you're not familiar). But you can't just do a straight average, because the penultimate column group has j columns, and the remainder group has less than j. Does that clear it up at all?

Collectives™ on Stack Overflow

Numpy - Averaging multiple columns of a 2D array

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related