NumPy array sum reduce

Question

I have a numpy array with three columns of the form:

x1 y1 f1


x2 y2 f2


...

xn yn fn

The (x,y) pairs may repeat. I would need another array such that each (x,y) pair appears once and the corresponding third column is the sum of all the f values that appeared next to (x,y).

For example, the array

would give

The order of rows is not relevant. What is the fastest way to do this in Python?

Thank you!

Look at at: docs.scipy.org/doc/numpy/reference/generated/… — hpaulj
– hpaulj, Commented Apr 13, 2015 at 18:13

Divakar · Accepted Answer · 2015-04-13 18:42:41Z

This would be one approach to solve it -

import numpy as np

# Input array
A = np.array([[1,2,4.0],
             [1,1,5.0],
             [1,2,3.0],
             [0,1,9.0]])

# Extract xy columns            
xy = A[:,0:2]

# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(xy.T)
sorted_xy =  xy[sorted_idx,:]

# Differentiation along rows for sorted array
df1 = np.diff(sorted_xy,axis=0)
df2 = np.append([True],np.any(df1!=0,1),0)
# OR df2 = np.append([True],np.logical_or(df1[:,0]!=0,df1[:,1]!=0),0)
# OR df2 = np.append([True],np.dot(df1!=0,[True,True]),0)

# Get unique sorted labels
sorted_labels = df2.cumsum(0)-1

# Get labels
labels = np.zeros_like(sorted_idx)
labels[sorted_idx] = sorted_labels

# Get unique indices
unq_idx  = sorted_idx[df2]

# Get counts and unique rows and setup output array
counts = np.bincount(labels, weights=A[:,2])
unq_rows = xy[unq_idx,:]
out = np.append(unq_rows,counts.ravel()[:,None],1)

Input & Output -

In [169]: A
Out[169]: 
array([[ 1.,  2.,  4.],
       [ 1.,  1.,  5.],
       [ 1.,  2.,  3.],
       [ 0.,  1.,  9.]])

In [170]: out
Out[170]: 
array([[ 0.,  1.,  9.],
       [ 1.,  1.,  5.],
       [ 1.,  2.,  7.]])

Botond · Accepted Answer · 2015-04-14 16:54:54Z

1

Thanks to @hpaulj, finally found the simplest solution. If d contains the 3-column data:

ind =d[0:2].astype(int)
x = zeros(shape=(N,N))
add.at(x,list(ind),d[2])

This solution assumes that the (x,y) indices in the first two columns are integer and smaller than N. This is what I need and should have mentioned in the post.

Edit: Note that the above solution produces a sparse matrix with the sum values at position (x,y) within the matrix.

edited Apr 14, 2015 at 16:54

answered Apr 13, 2015 at 19:37

Botond

2,8427 gold badges33 silver badges54 bronze badges

4 Comments

Divakar Over a year ago

Would be interesting to see some runtime tests, maybe?

Divakar Over a year ago

Are you sure this works? I tried it with the input listed in the question and got some other unexpected values. I assumed N = 3 for that.

user688635 Over a year ago

This does not produce anything near the output in the question. It produces a sparse matrix with the sums at the x, y coordinates.

Botond Over a year ago

@Colt45, that is correct. From the sparse matrix however it is easy to recover the desired output form. Probably it is not optimal for large N though. I completely admit that my question was not clear.

dawg · Accepted Answer · 2015-04-13 19:15:39Z

0

Certainly easily done in Python:

arr = np.array([[1,2,4.0],
                [1,1,5.0],
                [1,2,3.0],
                [0,1,9.0]])
d={}                
for x, y, z in arr:
    d.setdefault((x,y), 0)
    d[x,y]+=z     

>>> d
{(1.0, 2.0): 7.0, (0.0, 1.0): 9.0, (1.0, 1.0): 5.0}

Then translate back to numpy:

>>> np.array([[x,y,d[(x,y)]] for x,y in d.keys()]) 
array([[ 1.,  2.,  7.],
       [ 0.,  1.,  9.],
       [ 1.,  1.,  5.]])

answered Apr 13, 2015 at 19:15

dawg

105k24 gold badges142 silver badges217 bronze badges

1 Comment

Botond Over a year ago

nice solution with dictionaries, however it involves a bunch of for loops

hpaulj · Accepted Answer · 2015-04-13 19:56:57Z

0

If you have scipy, the sparse module does this kind of addition - again for an array where the 1st 2 columns are integers - ie. indexes.

from scipy import sparse
M = sparse.csr_matrix((d[:,0], (d[:,1],d[:,2])))
M = M.tocoo() # there may be a short cut to this csr coo round trip
x = np.column_stack([M.row, M.col, M.data]) # needs testing

For convenience in constructing certain kinds of linear algebra matrices, the csr sparse array format sums values with duplicate indices. It's implemented in compiled code so should be fairly fast. But putting the data into M and taking it back out might slow it down.

(ps. I haven't tested this script since I'm writing this on a machine without scipy).

answered Apr 13, 2015 at 19:56

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Collectives™ on Stack Overflow

NumPy array sum reduce

4 Answers 4

Comments

4 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related