3

Sorry for confusing title, but not sure how to make it more concise. Here's my requirements:

arr1 = np.array([3,5,9,1])
arr2 = ?(arr1)

arr2 would then be:

[
[0,1,2,0,0,0,0,0,0],
[0,1,2,3,4,0,0,0,0],
[0,1,2,3,4,5,6,7,8],
[0,0,0,0,0,0,0,0,0]
]

It doesn't need to vary based on the max, the shape is known in advance. So to start I've been able to get a shape of zeros:

arr2 = np.zeros((len(arr1),max_len))

And then of course I could do a for loop over arr1 like this:

for i, element in enumerate(arr1):
    arr2[i,0:element] = np.arange(element)

but that would likely take a long time and both dimensions here are rather large (arr1 is a few million rows, max_len is around 500). Is there a clean optimized way to do this in numpy?

1
  • Can you create a 'mask' by doing some sort of 'outer' operation on np.arange(10) with arr1? Something that is True where the sequence is <= arr1? Commented Jan 7, 2022 at 5:24

3 Answers 3

2

Building on a 'padding' idea posted by @Divakar some years ago:

In [161]: res = np.arange(9)[None,:].repeat(4,0)
In [162]: res[res>=arr1[:,None]] = 0
In [163]: res
Out[163]: 
array([[0, 1, 2, 0, 0, 0, 0, 0, 0],
       [0, 1, 2, 3, 4, 0, 0, 0, 0],
       [0, 1, 2, 3, 4, 5, 6, 7, 8],
       [0, 0, 0, 0, 0, 0, 0, 0, 0]])
Sign up to request clarification or add additional context in comments.

Comments

1

Try this with itertools.zip_longest -

import numpy as np
import itertools

l = map(range, arr1)
arr2 = np.column_stack((itertools.zip_longest(*l, fillvalue=0)))
print(arr2)
array([[0, 1, 2, 0, 0, 0, 0, 0, 0],
       [0, 1, 2, 3, 4, 0, 0, 0, 0],
       [0, 1, 2, 3, 4, 5, 6, 7, 8],
       [0, 0, 0, 0, 0, 0, 0, 0, 0]])

1 Comment

zip_longest is a good tool for 'padding' lsts or arrays. While there's a clever way of padding with array 'masking', this itertools is easier to remember and apply.
0

I am adding a slight variation on @hpaulj's answer because you mentioned that max_len is around 500 and you have millions of rows. In this case, you can precompute a 500 by 500 matrix containing all possible rows and index into it using arr1:

import numpy as np
np.random.seed(0)

max_len = 500
arr = np.random.randint(0, max_len, size=10**5)

# generate all unique rows first, then index
# can be faster if max_len << len(arr)
# 53 ms
template = np.tril(np.arange(max_len)[None,:].repeat(max_len,0), k=-1)
res = template[arr,:]

# 173 ms
res1 = np.arange(max_len)[None,:].repeat(arr.size,0)
res1[res1>=arr[:,None]] = 0

assert (res == res1).all()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.