By default, numpy is row major. Therefore, the following results are accepted naturally to me.
a = np.random.rand(5000, 5000)
%timeit a[0,:].sum()
3.57 µs ± 13.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit a[:,0].sum()
38.8 µs ± 8.19 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Because it is a row major order, it is natural to calculate faster by a [0,:]. However, if use the numpy sum function, the result is different.
%timeit a.sum(axis=0)
16.9 ms ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit a.sum(axis=1)
29.5 ms ± 90.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
If use the numpy sum function, it is faster to compute it along the column.
So My point is why the speed along the axis = 0 (calculated along column) is faster than the along the axis = 1(along row).
For example
a = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]], order='C')
In the row major order, [1,2,3] and [4,5,6], [7,8,9] are allocated to adjacent memory, respectively.
Therefore, the speed calculated along axis = 1 should be faster than axis = 0. However, when using numpy sum function, it is faster to calculate along the column (axis = 0).
How can you explain this?
Thanks
a[0,:]only sums the first row.a.sum(axis=0)sums the entire thing along the row. You're not computing the same thing at all.numpybuilt-in function" always! @COLDSPEED is exactly correct: you are computing different things.axis=1, not0. Also, the reason one is slower than the other is because of the cache misses and lack of locality of reference.