1

I have a numpy array

[0 0 0 0 0 0 0 1 1 2 2 2 2 2 1 1 0 0 0 0 0 0 0 0]

which I want to convert/dissolve into

[[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]]

my current approach is to first use a while loop to split the array into just 1s and then create an array based on np.where(x>0). I however believe that this is not the most efficient and elegant numpy solution. any ideas on how to improve this?

source = np.array([0., 0., 0., 0., 0., 0., 0., 1., 1., 2., 2., 2., 2.,
                   2., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=np.int)

diss = None
while np.any(source):
    row = np.greater(source, 0).astype(np.int)
    if diss is None:
        diss = row
    else:
        diss = np.vstack([diss, row])
    source -= row

idx = np.where(diss > 0)
result = np.zeros((0,source.shape[0]), dtype=np.int)
for x, y in zip(*idx):
    row = np.zeros(source.shape, dtype=np.int)
    row[y] = 1
    result = np.vstack([result, row])
3
  • Is the order of the rows critical? Commented Apr 7, 2014 at 11:26
  • No, the order is not critical. The important thing is that any value >1 gets broken down into ones. Commented Apr 7, 2014 at 11:33
  • What is the typical value for source.max()? Commented Apr 7, 2014 at 11:55

2 Answers 2

2

Here's one way:

In [38]: x = np.array([0,0,0,0,0,1,1,2,2,2,1,1,0,0])

In [39]: n = x.sum()

In [40]: rows = np.arange(n)

In [41]: positions = np.nonzero(x)[0]

In [42]: cols = np.repeat(positions, x[positions])

In [43]: result = np.zeros((n, len(x)), dtype=int)

In [44]: result[rows, cols] = 1

In [45]: result
Out[45]: 
array([[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0]])
Sign up to request clarification or add additional context in comments.

1 Comment

I've timed both possible answers, and this one is faster: 10000 loops, best of 3: 21.8 µs per loop vs 10000 loops, best of 3: 36.2 µs per loop
2

For your example, this is about 5 times faster, and it doesn't damage source.

source = np.array([0, 0, 0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0], int)
m, h, w = source.max(), source.sum(), len(source)
i = np.concatenate([np.nonzero(source>i)[0] for i in xrange(m)])
result = np.zeros((h,w), int)
result[range(h), i] = 1

There is still a loop of length source.max() so if that is large (for the example it's only two) perhaps something better can be done.

2 Comments

source-i>0 seems like it'd be better written as source>i.
Thanks @user2357112, it started as np.nonzero(source - i) before I realized it was counting negative values. That actually saves about 10% of the time :P

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.