Combining NumPy arrays

Question

I have two 20x100x3 NumPy arrays which I want to combine into a 40 x 100 x 3 array, that is, just add more lines to the array. I am confused by which function I want: is it vstack, hstack, column_stack or maybe something else?

Giltech · Accepted Answer · 2011-07-18 22:51:15Z

26

I believe it's vstack you want

p=array_2
q=array_2
p=numpy.vstack([p,q])

answered Jul 18, 2011 at 22:51

Giltech

5965 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

JoshAdel Over a year ago

not sure why your answer didn't show up when I first visited the page. +1 for suggesting vstack first.

NOhs Over a year ago

Please note that the documentation suggests to use stack or concatenate nowadays and that vstack is only supported for backwards compatibility see: docs.scipy.org/doc/numpy-1.13.0/reference/generated/…

JoshAdel · Accepted Answer · 2011-07-18 23:41:13Z

15

One of the best ways of learning is experimenting, but I would say you want np.vstack although there are other ways of doing the same thing:

a = np.ones((20,100,3))
b = np.vstack((a,a)) 

print b.shape # (40,100,3)

or

b = np.concatenate((a,a),axis=0)

EDIT

Just as a note, on my machine for the sized arrays in the OP's question, I find that np.concatenate is about 2x faster than np.vstack

In [172]: a = np.random.normal(size=(20,100,3))

In [173]: c = np.random.normal(size=(20,100,3))

In [174]: %timeit b = np.concatenate((a,c),axis=0)
100000 loops, best of 3: 13.3 us per loop

In [175]: %timeit b = np.vstack((a,c))
10000 loops, best of 3: 26.1 us per loop

edited Jul 18, 2011 at 23:41

answered Jul 18, 2011 at 23:25

JoshAdel

69.1k27 gold badges146 silver badges146 bronze badges

4 Comments

Giltech Over a year ago

I may be, being stupid here as I've not used timeit much, but does concatenate not take 10x as many loops?

JoshAdel Over a year ago

@Giltech, while timeit uses 10x more loops to benchmark np.concatenate (it seems to choose this automatically), the important number here is the time per loop

Enno Gröper Over a year ago

You should be careful about the factor of 2. Your test case is with small arrays of 6000 items and in the range of us. Simply extending the input array to (20,10000,3) results in 6.62 ms per loop vs. 6.38 ms per loop, still in favor for using concatenate directly. So for big arrays the difference shouldn't really matter.

JoshAdel Over a year ago

@EnnoGröper good point. One should always perform their own benchmark/do their own profiling when the performance matters. I was just suggesting a particular method given the OP system size.

Ben Racine · Accepted Answer · 2011-07-20 19:31:22Z

11

Might be worth mentioning that

    np.concatenate((a1, a2, ...), axis=0)

is the general form and vstack and hstack are specific cases. I find it easiest to just know which dimension I want to stack over and provide that as the argument to np.concatenate.

answered Jul 20, 2011 at 19:31

Ben Racine

5514 silver badges15 bronze badges

Comments

Michel Samia · Accepted Answer · 2013-01-30 15:43:02Z

I tried a little benchmark between r_ and vstack and the result is very interesting:

import numpy as np

NCOLS = 10
NROWS = 2
NMATRICES = 10000

def mergeR(matrices):
    result = np.zeros([0, NCOLS])

    for m in matrices:
        result = np.r_[ result, m]

def mergeVstack(matrices):
    result = np.vstack(matrices)

def main():
    matrices = tuple( np.random.random([NROWS, NCOLS]) for i in xrange(NMATRICES) )
    mergeR(matrices)
    mergeVstack(matrices)

    return 0

if __name__ == '__main__':
    main()

Then I ran profiler:

python -m cProfile -s cumulative np_merge_benchmark.py

and the results:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
     1    0.579    0.579    4.139    4.139 np_merge_benchmark.py:21(mergeR)
...
     1    0.000    0.000    0.054    0.054 np_merge_benchmark.py:27(mergeVstack)

So the vstack way is 77x faster!

Peter Mortensen · Accepted Answer · 2013-12-22 09:39:31Z

3

By the way, there is also r_:

>>> from scipy import *
>>> a = rand(20,100,3)
>>> b = rand(20,100,3)
>>> a.shape
(20, 100, 3)
>>> b.shape
(20, 100, 3)
>>> r_[a,b].shape
(40, 100, 3)
>>> (r_[a,b] == vstack([a,b])).all()
True

edited Dec 22, 2013 at 9:39

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Jul 21, 2011 at 3:06

wim

368k113 gold badges681 silver badges816 bronze badges

Collectives™ on Stack Overflow

Combining NumPy arrays

5 Answers 5

2 Comments

4 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

4 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related