9

I have two 20x100x3 NumPy arrays which I want to combine into a 40 x 100 x 3 array, that is, just add more lines to the array. I am confused by which function I want: is it vstack, hstack, column_stack or maybe something else?

5 Answers 5

26

I believe it's vstack you want

p=array_2
q=array_2
p=numpy.vstack([p,q])
Sign up to request clarification or add additional context in comments.

2 Comments

not sure why your answer didn't show up when I first visited the page. +1 for suggesting vstack first.
Please note that the documentation suggests to use stack or concatenate nowadays and that vstack is only supported for backwards compatibility see: docs.scipy.org/doc/numpy-1.13.0/reference/generated/…
15

One of the best ways of learning is experimenting, but I would say you want np.vstack although there are other ways of doing the same thing:

a = np.ones((20,100,3))
b = np.vstack((a,a)) 

print b.shape # (40,100,3)

or

b = np.concatenate((a,a),axis=0)

EDIT

Just as a note, on my machine for the sized arrays in the OP's question, I find that np.concatenate is about 2x faster than np.vstack

In [172]: a = np.random.normal(size=(20,100,3))

In [173]: c = np.random.normal(size=(20,100,3))

In [174]: %timeit b = np.concatenate((a,c),axis=0)
100000 loops, best of 3: 13.3 us per loop

In [175]: %timeit b = np.vstack((a,c))
10000 loops, best of 3: 26.1 us per loop

4 Comments

I may be, being stupid here as I've not used timeit much, but does concatenate not take 10x as many loops?
@Giltech, while timeit uses 10x more loops to benchmark np.concatenate (it seems to choose this automatically), the important number here is the time per loop
You should be careful about the factor of 2. Your test case is with small arrays of 6000 items and in the range of us. Simply extending the input array to (20,10000,3) results in 6.62 ms per loop vs. 6.38 ms per loop, still in favor for using concatenate directly. So for big arrays the difference shouldn't really matter.
@EnnoGröper good point. One should always perform their own benchmark/do their own profiling when the performance matters. I was just suggesting a particular method given the OP system size.
11

Might be worth mentioning that

    np.concatenate((a1, a2, ...), axis=0) 

is the general form and vstack and hstack are specific cases. I find it easiest to just know which dimension I want to stack over and provide that as the argument to np.concatenate.

Comments

4

I tried a little benchmark between r_ and vstack and the result is very interesting:

import numpy as np

NCOLS = 10
NROWS = 2
NMATRICES = 10000

def mergeR(matrices):
    result = np.zeros([0, NCOLS])

    for m in matrices:
        result = np.r_[ result, m]

def mergeVstack(matrices):
    result = np.vstack(matrices)

def main():
    matrices = tuple( np.random.random([NROWS, NCOLS]) for i in xrange(NMATRICES) )
    mergeR(matrices)
    mergeVstack(matrices)

    return 0

if __name__ == '__main__':
    main()

Then I ran profiler:

python -m cProfile -s cumulative np_merge_benchmark.py

and the results:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
     1    0.579    0.579    4.139    4.139 np_merge_benchmark.py:21(mergeR)
...
     1    0.000    0.000    0.054    0.054 np_merge_benchmark.py:27(mergeVstack)

So the vstack way is 77x faster!

Comments

3

By the way, there is also r_:

>>> from scipy import *
>>> a = rand(20,100,3)
>>> b = rand(20,100,3)
>>> a.shape
(20, 100, 3)
>>> b.shape
(20, 100, 3)
>>> r_[a,b].shape
(40, 100, 3)
>>> (r_[a,b] == vstack([a,b])).all()
True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.