Group and average NumPy matrix

Question

Say I have an arbitrary numpy matrix that looks like this:

arr = [[  6.0   12.0   1.0]
       [  7.0   9.0   1.0]
       [  8.0   7.0   1.0]
       [  4.0   3.0   2.0]
       [  6.0   1.0   2.0]
       [  2.0   5.0   2.0]
       [  9.0   4.0   3.0]
       [  2.0   1.0   4.0]
       [  8.0   4.0   4.0]
       [  3.0   5.0   4.0]]

What would be an efficient way of averaging rows that are grouped by their third column number?

The expected output would be:

result = [[  7.0  9.33  1.0]
          [  4.0  3.0  2.0]
          [  9.0  4.0  3.0]
          [  4.33  3.33  4.0]]

Using only numpy and without loops stackoverflow.com/a/66871328/10375049 — Marco Cerliani
– Marco Cerliani, Commented Mar 30, 2021 at 16:47

Eelco Hoogendoorn · Accepted Answer · 2016-04-02 20:40:07Z

9

A compact solution is to use numpy_indexed (disclaimer: I am its author), which implements a fully vectorized solution:

import numpy_indexed as npi
npi.group_by(arr[:, 2]).mean(arr)

edited Apr 2, 2016 at 20:40

answered Apr 2, 2016 at 13:32

Eelco Hoogendoorn

10.8k1 gold badge46 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

dawg · Accepted Answer · 2022-12-10 20:10:11Z

You can do:

for x in sorted(np.unique(arr[...,2])):
    results.append([np.average(arr[np.where(arr[...,2]==x)][...,0]), 
                    np.average(arr[np.where(arr[...,2]==x)][...,1]),
                    x])

Testing:

>>> arr
array([[  6.,  12.,   1.],
       [  7.,   9.,   1.],
       [  8.,   7.,   1.],
       [  4.,   3.,   2.],
       [  6.,   1.,   2.],
       [  2.,   5.,   2.],
       [  9.,   4.,   3.],
       [  2.,   1.,   4.],
       [  8.,   4.,   4.],
       [  3.,   5.,   4.]])
>>> results=[]
>>> for x in sorted(np.unique(arr[...,2])):
...     results.append([np.average(arr[np.where(arr[...,2]==x)][...,0]), 
...                     np.average(arr[np.where(arr[...,2]==x)][...,1]),
...                     x])
... 
>>> results
[[7.0, 9.3333333333333339, 1.0], [4.0, 3.0, 2.0], [9.0, 4.0, 3.0], [4.333333333333333, 3.3333333333333335, 4.0]]

The array arr does not need to be sorted, and all the intermediate arrays are views (ie, not new arrays of data). The average is calculated efficiently directly from those views.

Or, for a pure numpy solution:

groups = arr[:,2].copy()

_ndx = np.argsort(groups)
_id, _pos, grp_count  = np.unique(groups[_ndx], 
                return_index=True, 
                return_counts=True)

grp_sum = np.add.reduceat(arr[_ndx], _pos, axis=0)
grp_mean = grp_sum / grp_count[:,None]  

>>> grp_mean
array([[7.        , 9.33333333, 1.        ],
       [4.        , 3.        , 2.        ],
       [9.        , 4.        , 3.        ],
       [4.33333333, 3.33333333, 4.        ]])

DmitrySemenov · Accepted Answer · 2015-03-27 01:15:06Z

3

solution

from itertools import groupby
from operator import itemgetter

arr = [[6.0, 12.0, 1.0],
       [7.0, 9.0, 1.0],
       [8.0, 7.0, 1.0],
       [4.0, 3.0, 2.0],
       [6.0, 1.0, 2.0],
       [2.0, 5.0, 2.0],
       [9.0, 4.0, 3.0],
       [2.0, 1.0, 4.0],
       [8.0, 4.0, 4.0],
       [3.0, 5.0, 4.0]]

result = []

for groupByID, rows in groupby(arr, key=itemgetter(2)):
    position1, position2, counter = 0, 0, 0
    for row in rows:
        position1+=row[0]
        position2+=row[1]
        counter+=1
    result.append([position1/counter, position2/counter, groupByID])

print(result)

would output:

[[7.0, 9.333333333333334, 1.0]]
[[4.0, 3.0, 2.0]]
[[9.0, 4.0, 3.0]]
[[4.333333333333333, 3.3333333333333335, 4.0]]

edited Mar 27, 2015 at 1:15

answered Mar 27, 2015 at 0:51

DmitrySemenov

10.5k22 gold badges97 silver badges144 bronze badges

Comments

HYRY · Accepted Answer · 2015-03-27 01:44:54Z

3

arr = np.array(
[[  6.0,   12.0,   1.0],
 [  7.0,   9.0,   1.0],
 [  8.0,   7.0,   1.0],
 [  4.0,   3.0,   2.0],
 [  6.0,   1.0,   2.0],
 [  2.0,   5.0,   2.0],
 [  9.0,   4.0,   3.0],
 [  2.0,   1.0,   4.0],
 [  8.0,   4.0,   4.0],
 [  3.0,   5.0,   4.0]])
np.array([a.mean(0) for a in np.split(arr, np.argwhere(np.diff(arr[:, 2])) + 1)])

answered Mar 27, 2015 at 1:44

HYRY

97.8k28 gold badges197 silver badges192 bronze badges

Collectives™ on Stack Overflow

Group and average NumPy matrix

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related