6

Right now I am doing this by iterating, but there has to be a way to accomplish this task using numpy functions. My goal is to take a 2D array and average J columns at a time, producing a new array with the same number of rows as the original, but with columns/J columns.

So I want to take this:

J = 2 // two columns averaged at a time

[[1 2 3 4]
 [4 3 7 1]
 [6 2 3 4]
 [3 4 4 1]]

and produce this:

[[1.5 3.5]
 [3.5 4.0]
 [4.0 3.5]
 [3.5 2.5]]

Is there a simple way to accomplish this task? I also need a way such that if I never end up with an unaveraged remainder column. So if, for example, I have an input array with 5 columns and J=2, I would average the first two columns, then the last three columns.

Any help you can provide would be great.

2 Answers 2

4
data.reshape(-1,j).mean(axis=1).reshape(data.shape[0],-1)

If your j divides data.shape[1], that is.

Example:

In [40]: data
Out[40]: 
array([[7, 9, 7, 2],
       [7, 6, 1, 5],
       [8, 1, 0, 7],
       [8, 3, 3, 2]])

In [41]: data.reshape(-1,j).mean(axis=1).reshape(data.shape[0],-1)
Out[41]: 
array([[ 8. ,  4.5],
       [ 6.5,  3. ],
       [ 4.5,  3.5],
       [ 5.5,  2.5]])
Sign up to request clarification or add additional context in comments.

Comments

2

First of all, it looks to me like you're not averaging the columns at all, you're just averaging two data points at a time. Seems to me like your best off reshaping the array, so your that you have a Nx2 data structure that you can feed directly to mean. You may have to pad it first if the number of columns isn't quite compatible. Then at the end, just do a weighted average of the padded remainder column and the one before it. Finally reshape back to the shape you want.

To play off of the example provided by TheodrosZelleke:

In [1]: data = np.concatenate((data, np.array([[5, 6, 7, 8]]).T), 1)

In [2]: data
Out[2]: 
array([[7, 9, 7, 2, 5],
       [7, 6, 1, 5, 6],
       [8, 1, 0, 7, 7],
       [8, 3, 3, 2, 8]])

In [3]: cols = data.shape[1]

In [4]: j = 2

In [5]: dataPadded = np.concatenate((data, np.zeros((data.shape[0], j - cols % j))), 1)

In [6]: dataPadded
Out[6]: 
array([[ 7.,  9.,  7.,  2.,  5.,  0.],
       [ 7.,  6.,  1.,  5.,  6.,  0.],
       [ 8.,  1.,  0.,  7.,  7.,  0.],
       [ 8.,  3.,  3.,  2.,  8.,  0.]])

In [7]: dataAvg = dataPadded.reshape((-1,j)).mean(axis=1).reshape((data.shape[0], -1))

In [8]: dataAvg
Out[8]: 
array([[ 8. ,  4.5,  2.5],
       [ 6.5,  3. ,  3. ],
       [ 4.5,  3.5,  3.5],
       [ 5.5,  2.5,  4. ]])

In [9]: if cols % j:
    dataAvg[:, -2] = (dataAvg[:, -2] * j + dataAvg[:, -1] * (cols % j)) / (j + cols % j)
    dataAvg = dataAvg[:, :-1]
   ....:     

In [10]: dataAvg
Out[10]: 
array([[ 8.        ,  3.83333333],
       [ 6.5       ,  3.        ],
       [ 4.5       ,  3.5       ],
       [ 5.5       ,  3.        ]])

2 Comments

Would you mind elaborating on what is being accomplished by the statements in line 9? That part is tripping me up.
Yeah, that's the weighted average part. Since you specified above that you want remainder columns to be tacked on to the last column, the last line folds in the remainder, if there is one (if so, cols % j is nonzero, look up modulus operator if you're not familiar). But you can't just do a straight average, because the penultimate column group has j columns, and the remainder group has less than j. Does that clear it up at all?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.