3

I have a numpy array with three columns of the form:

x1 y1 f1


x2 y2 f2


...

xn yn fn

The (x,y) pairs may repeat. I would need another array such that each (x,y) pair appears once and the corresponding third column is the sum of all the f values that appeared next to (x,y).

For example, the array

1 2 4.0

1 1 5.0

1 2 3.0

0 1 9.0

would give

0 1 9.0

1 1 5.0

1 2 7.0

The order of rows is not relevant. What is the fastest way to do this in Python?

Thank you!

3
  • 1
    Look at at: docs.scipy.org/doc/numpy/reference/generated/… Commented Apr 13, 2015 at 18:13
  • I would use a pandas dataframe Commented Apr 13, 2015 at 18:49
  • @hpaulj, this works nicely! Commented Apr 13, 2015 at 19:31

4 Answers 4

2

This would be one approach to solve it -

import numpy as np

# Input array
A = np.array([[1,2,4.0],
             [1,1,5.0],
             [1,2,3.0],
             [0,1,9.0]])

# Extract xy columns            
xy = A[:,0:2]

# Perform lex sort and get the sorted indices and xy pairs
sorted_idx = np.lexsort(xy.T)
sorted_xy =  xy[sorted_idx,:]

# Differentiation along rows for sorted array
df1 = np.diff(sorted_xy,axis=0)
df2 = np.append([True],np.any(df1!=0,1),0)
# OR df2 = np.append([True],np.logical_or(df1[:,0]!=0,df1[:,1]!=0),0)
# OR df2 = np.append([True],np.dot(df1!=0,[True,True]),0)

# Get unique sorted labels
sorted_labels = df2.cumsum(0)-1

# Get labels
labels = np.zeros_like(sorted_idx)
labels[sorted_idx] = sorted_labels

# Get unique indices
unq_idx  = sorted_idx[df2]

# Get counts and unique rows and setup output array
counts = np.bincount(labels, weights=A[:,2])
unq_rows = xy[unq_idx,:]
out = np.append(unq_rows,counts.ravel()[:,None],1)

Input & Output -

In [169]: A
Out[169]: 
array([[ 1.,  2.,  4.],
       [ 1.,  1.,  5.],
       [ 1.,  2.,  3.],
       [ 0.,  1.,  9.]])

In [170]: out
Out[170]: 
array([[ 0.,  1.,  9.],
       [ 1.,  1.,  5.],
       [ 1.,  2.,  7.]])
Sign up to request clarification or add additional context in comments.

Comments

1

Thanks to @hpaulj, finally found the simplest solution. If d contains the 3-column data:

ind =d[0:2].astype(int)
x = zeros(shape=(N,N))
add.at(x,list(ind),d[2])

This solution assumes that the (x,y) indices in the first two columns are integer and smaller than N. This is what I need and should have mentioned in the post.

Edit: Note that the above solution produces a sparse matrix with the sum values at position (x,y) within the matrix.

4 Comments

Would be interesting to see some runtime tests, maybe?
Are you sure this works? I tried it with the input listed in the question and got some other unexpected values. I assumed N = 3 for that.
This does not produce anything near the output in the question. It produces a sparse matrix with the sums at the x, y coordinates.
@Colt45, that is correct. From the sparse matrix however it is easy to recover the desired output form. Probably it is not optimal for large N though. I completely admit that my question was not clear.
0

Certainly easily done in Python:

arr = np.array([[1,2,4.0],
                [1,1,5.0],
                [1,2,3.0],
                [0,1,9.0]])
d={}                
for x, y, z in arr:
    d.setdefault((x,y), 0)
    d[x,y]+=z     

>>> d
{(1.0, 2.0): 7.0, (0.0, 1.0): 9.0, (1.0, 1.0): 5.0}

Then translate back to numpy:

>>> np.array([[x,y,d[(x,y)]] for x,y in d.keys()]) 
array([[ 1.,  2.,  7.],
       [ 0.,  1.,  9.],
       [ 1.,  1.,  5.]])

1 Comment

nice solution with dictionaries, however it involves a bunch of for loops
0

If you have scipy, the sparse module does this kind of addition - again for an array where the 1st 2 columns are integers - ie. indexes.

from scipy import sparse
M = sparse.csr_matrix((d[:,0], (d[:,1],d[:,2])))
M = M.tocoo() # there may be a short cut to this csr coo round trip
x = np.column_stack([M.row, M.col, M.data]) # needs testing

For convenience in constructing certain kinds of linear algebra matrices, the csr sparse array format sums values with duplicate indices. It's implemented in compiled code so should be fairly fast. But putting the data into M and taking it back out might slow it down.

(ps. I haven't tested this script since I'm writing this on a machine without scipy).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.