2

I have to process a lot of arrays, they contain 512x256 pixel-like data, however most entries are 0, so I want to only save the non-zero values, i.e.:

import numpy as np
import time

xlist=[]
ylist=[]
zlist=[]

millis = time.time()*1000
ar = np.zeros((512,256),dtype=np.uint16)

for x in range(0,512):
    for y in range(0,256):
        if (0<ar[x][y]<1000):
            xlist.append(x)
            ylist.append(y)
            zlist.append(ar[x][y])

print time.time()*1000-millis

this takes about 750ms on my pc. Is there a way to do this faster? I have to process tens of thousands of these pixel arrays.

2
  • 1
    Looks like you're dealing with sparse matrices. Scipy gives you some class types to choose from: docs.scipy.org/doc/scipy/reference/sparse.html Commented Aug 15, 2013 at 14:31
  • In general, if you can avoid writing loops when dealing with numpy arrays you can get much faster performance. As a side note, if this is python 2.x, just changing range to xrange could get you a tiny performance increase. Commented Aug 15, 2013 at 16:27

2 Answers 2

3

You can try something like this:

ar = np.zeros((512,256),dtype=np.uint16)

# there should be something here to fill ar    

xs = np.arange(ar.shape[0])
ys = np.arange(ar.shape[1])

check = (0 < ar) & (ar < 1000)
ind = np.where( check )
xlist = xs[ ind[0] ]
ylist = ys[ ind[1] ] 
zlist = ar[ check ]
Sign up to request clarification or add additional context in comments.

Comments

3

SciPy provides very good support for sparse matrices, which should provide a good solution to your problem. Check out the documentation of the scipy.sparse module here.

To convert your numpy array to a coordinate-based (COO) sparse matrix as you do with your code above, you can proceed as follows:

import numpy as np
from scipy import sparse

#your original matrix
A  = numpy.array([[1,0,3],[0,5,6],[0,0,9]])

#We let scipy create a sparse matrix from your data
sA = sparse.coo_matrix(A)

#The x,y,z

xlist,ylist,zlist = sA.row,sA.col,sA.data

print (xlist,ylist,zlist)

#This will print: (array([0, 0, 1, 1, 2], dtype=int32), array([0, 2, 1, 2, 2], dtype=int32), array([1, 3, 5, 6, 9]))

Since scipy code usually is highly optimized this should run faster than your looping solution (I didn't check it though).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.