5
a = np.array([1,2,3])
b = np.array([4,5])

l = [a,b]

I want a function stack_padding such that:

assert(stack_padding(l) == np.array([[1,2,3],[4,5,0]])

Is there a standard way in numpy of achieving

EDIT: l could have potentially many more elements

1
  • 1
    I got this far....but depends you know which one is the ending shape you want. b.resize(a.shape, refcheck=False) That will resize b to [4,5,0]. Commented Oct 29, 2018 at 18:27

3 Answers 3

6

I think itertools.zip_longest with fill_value=0 can work for you:

import itertools

a = np.array([1,2,3])
b = np.array([4,5])

l = [a,b]

def stack_padding(l):
    return np.column_stack((itertools.zip_longest(*l, fillvalue=0)))

>>> stack_padding(l)
array([[1, 2, 3],
       [4, 5, 0]])
Sign up to request clarification or add additional context in comments.

5 Comments

This is cool! Though it is a bit awkward that we need to transpose the result
@user3483203, I thought of that too, but then you add a dimension, so you also need to call squeeze or something like that : np.dstack((itertools.zip_longest(*l, fillvalue=0))).squeeze()
column_stack is dstack without adding the dimension
Ahh! That makes sense! Thanks @user3483203!
Transpose of an array is trivial. But np.stack(list(zip_longest...)), axis=1) gives more control over the orientation.
4

With numpy.pad:

a = np.array([1,2,3])
b = np.array([4,5])

l = [a,b]

max_len = max([len(arr) for arr in l])
padded = np.array([np.lib.pad(arr, (0, max_len - len(arr)), 'constant', constant_values=0) for arr in l])

Comments

2

If you don't want to use itertools and column_stack, numpy.ndarray.resize will also do the job perfectly. As mentioned by jtweeder, you just need to know to resulting size of each rows. The advantage to use resize is that numpy.ndarray is contiguous in memory. Resizing is faster when each row differs alot in size. The performance difference is observable between the two approaches.

import numpy as np
import timeit
import itertools

def stack_padding(it):

    def resize(row, size):
        new = np.array(row)
        new.resize(size)
        return new

    # find longest row length
    row_length = max(it, key=len).__len__()
    mat = np.array( [resize(row, row_length) for row in it] )

    return mat

def stack_padding1(l):
    return np.column_stack((itertools.zip_longest(*l, fillvalue=0)))


if __name__ == "__main__":
    n_rows = 200
    row_lengths = np.random.randint(30, 50, size=n_rows)
    mat = [np.random.randint(0, 100, size=s) for s in row_lengths]

    def test_stack_padding():
        global mat
        stack_padding(mat)

    def test_itertools():
        global mat
        stack_padding1(mat)

    t1 = timeit.timeit(test_stack_padding, number=1000)
    t2 = timeit.timeit(test_itertools, number=1000)
    print('With ndarray.resize: ', t1)
    print('With itertool and vstack: ', t2)

The resize method wins in the above comparison:

>>> With ndarray.resize:  0.30080295499647036
>>> With itertool and vstack:  1.0151802329928614

4 Comments

there is a bug in the code: max(it, key=len).__len__() always returns len(it)
@Jsevillamol It is supposed to return the length of the longest row in it. I ran several tests on my machine and it seemed to work correctly. What is it in your situation? A python list or numpy.ndarray? p.s. It should work with both the two types though..
oops my bad! I made a typo when writing your code. Thanks for the huge help and thorought analysis!
In the def resize function, i had to pass in resize(size, refcheck=False) for the inplace resize operation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.