Stacking Numpy arrays of different length using padding

Question

a = np.array([1,2,3])
b = np.array([4,5])

l = [a,b]

I want a function stack_padding such that:

assert(stack_padding(l) == np.array([[1,2,3],[4,5,0]])

Is there a standard way in numpy of achieving

EDIT: l could have potentially many more elements

I got this far....but depends you know which one is the ending shape you want. b.resize(a.shape, refcheck=False) That will resize b to [4,5,0]. — jtweeder
– jtweeder, Commented Oct 29, 2018 at 18:27

sacuL · Accepted Answer · 2018-10-29 18:47:44Z

6

I think itertools.zip_longest with fill_value=0 can work for you:

import itertools

a = np.array([1,2,3])
b = np.array([4,5])

l = [a,b]

def stack_padding(l):
    return np.column_stack((itertools.zip_longest(*l, fillvalue=0)))

>>> stack_padding(l)
array([[1, 2, 3],
       [4, 5, 0]])

edited Oct 29, 2018 at 18:47

answered Oct 29, 2018 at 18:37

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Jsevillamol Over a year ago

This is cool! Though it is a bit awkward that we need to transpose the result

sacuL Over a year ago

@user3483203, I thought of that too, but then you add a dimension, so you also need to call squeeze or something like that : np.dstack((itertools.zip_longest(*l, fillvalue=0))).squeeze()

user3483203 Over a year ago

column_stack is dstack without adding the dimension

sacuL Over a year ago

Ahh! That makes sense! Thanks @user3483203!

hpaulj Over a year ago

Transpose of an array is trivial. But np.stack(list(zip_longest...)), axis=1) gives more control over the orientation.

dobkind · Accepted Answer · 2018-10-29 19:02:33Z

4

With numpy.pad:

a = np.array([1,2,3])
b = np.array([4,5])

l = [a,b]

max_len = max([len(arr) for arr in l])
padded = np.array([np.lib.pad(arr, (0, max_len - len(arr)), 'constant', constant_values=0) for arr in l])

answered Oct 29, 2018 at 19:02

dobkind

4264 silver badges11 bronze badges

Comments

Niko Z. · Accepted Answer · 2018-10-29 19:36:09Z

2

If you don't want to use itertools and column_stack, numpy.ndarray.resize will also do the job perfectly. As mentioned by jtweeder, you just need to know to resulting size of each rows. The advantage to use resize is that numpy.ndarray is contiguous in memory. Resizing is faster when each row differs alot in size. The performance difference is observable between the two approaches.

import numpy as np
import timeit
import itertools

def stack_padding(it):

    def resize(row, size):
        new = np.array(row)
        new.resize(size)
        return new

    # find longest row length
    row_length = max(it, key=len).__len__()
    mat = np.array( [resize(row, row_length) for row in it] )

    return mat

def stack_padding1(l):
    return np.column_stack((itertools.zip_longest(*l, fillvalue=0)))


if __name__ == "__main__":
    n_rows = 200
    row_lengths = np.random.randint(30, 50, size=n_rows)
    mat = [np.random.randint(0, 100, size=s) for s in row_lengths]

    def test_stack_padding():
        global mat
        stack_padding(mat)

    def test_itertools():
        global mat
        stack_padding1(mat)

    t1 = timeit.timeit(test_stack_padding, number=1000)
    t2 = timeit.timeit(test_itertools, number=1000)
    print('With ndarray.resize: ', t1)
    print('With itertool and vstack: ', t2)

The resize method wins in the above comparison:

>>> With ndarray.resize:  0.30080295499647036
>>> With itertool and vstack:  1.0151802329928614

answered Oct 29, 2018 at 19:36

Niko Z.

3621 silver badge11 bronze badges

4 Comments

Jsevillamol Over a year ago

there is a bug in the code: max(it, key=len).__len__() always returns len(it)

Niko Z. Over a year ago

@Jsevillamol It is supposed to return the length of the longest row in it. I ran several tests on my machine and it seemed to work correctly. What is it in your situation? A python list or numpy.ndarray? p.s. It should work with both the two types though..

Jsevillamol Over a year ago

oops my bad! I made a typo when writing your code. Thanks for the huge help and thorought analysis!

Hansang Over a year ago

In the def resize function, i had to pass in resize(size, refcheck=False) for the inplace resize operation

Collectives™ on Stack Overflow

Stacking Numpy arrays of different length using padding

3 Answers 3

5 Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related