I have two 20x100x3 NumPy arrays which I want to combine into a 40 x 100 x 3 array, that is, just add more lines to the array. I am confused by which function I want: is it vstack, hstack, column_stack or maybe something else?
5 Answers
I believe it's vstack you want
p=array_2
q=array_2
p=numpy.vstack([p,q])
2 Comments
JoshAdel
not sure why your answer didn't show up when I first visited the page. +1 for suggesting vstack first.
NOhs
Please note that the documentation suggests to use
stack or concatenate nowadays and that vstack is only supported for backwards compatibility see: docs.scipy.org/doc/numpy-1.13.0/reference/generated/…One of the best ways of learning is experimenting, but I would say you want np.vstack although there are other ways of doing the same thing:
a = np.ones((20,100,3))
b = np.vstack((a,a))
print b.shape # (40,100,3)
or
b = np.concatenate((a,a),axis=0)
EDIT
Just as a note, on my machine for the sized arrays in the OP's question, I find that np.concatenate is about 2x faster than np.vstack
In [172]: a = np.random.normal(size=(20,100,3))
In [173]: c = np.random.normal(size=(20,100,3))
In [174]: %timeit b = np.concatenate((a,c),axis=0)
100000 loops, best of 3: 13.3 us per loop
In [175]: %timeit b = np.vstack((a,c))
10000 loops, best of 3: 26.1 us per loop
4 Comments
Giltech
I may be, being stupid here as I've not used timeit much, but does concatenate not take 10x as many loops?
JoshAdel
@Giltech, while timeit uses 10x more loops to benchmark
np.concatenate (it seems to choose this automatically), the important number here is the time per loopEnno Gröper
You should be careful about the factor of 2. Your test case is with small arrays of 6000 items and in the range of us. Simply extending the input array to (20,10000,3) results in 6.62 ms per loop vs. 6.38 ms per loop, still in favor for using concatenate directly. So for big arrays the difference shouldn't really matter.
JoshAdel
@EnnoGröper good point. One should always perform their own benchmark/do their own profiling when the performance matters. I was just suggesting a particular method given the OP system size.
I tried a little benchmark between r_ and vstack and the result is very interesting:
import numpy as np
NCOLS = 10
NROWS = 2
NMATRICES = 10000
def mergeR(matrices):
result = np.zeros([0, NCOLS])
for m in matrices:
result = np.r_[ result, m]
def mergeVstack(matrices):
result = np.vstack(matrices)
def main():
matrices = tuple( np.random.random([NROWS, NCOLS]) for i in xrange(NMATRICES) )
mergeR(matrices)
mergeVstack(matrices)
return 0
if __name__ == '__main__':
main()
Then I ran profiler:
python -m cProfile -s cumulative np_merge_benchmark.py
and the results:
ncalls tottime percall cumtime percall filename:lineno(function)
...
1 0.579 0.579 4.139 4.139 np_merge_benchmark.py:21(mergeR)
...
1 0.000 0.000 0.054 0.054 np_merge_benchmark.py:27(mergeVstack)
So the vstack way is 77x faster!