5

By default, numpy is row major. Therefore, the following results are accepted naturally to me.

a = np.random.rand(5000, 5000)

%timeit a[0,:].sum()
3.57 µs ± 13.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit a[:,0].sum()
38.8 µs ± 8.19 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Because it is a row major order, it is natural to calculate faster by a [0,:]. However, if use the numpy sum function, the result is different.

%timeit a.sum(axis=0)
16.9 ms ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit a.sum(axis=1)
29.5 ms ± 90.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

If use the numpy sum function, it is faster to compute it along the column.

So My point is why the speed along the axis = 0 (calculated along column) is faster than the along the axis = 1(along row).

For example

a = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]], order='C')

In the row major order, [1,2,3] and [4,5,6], [7,8,9] are allocated to adjacent memory, respectively.

Therefore, the speed calculated along axis = 1 should be faster than axis = 0. However, when using numpy sum function, it is faster to calculate along the column (axis = 0).

How can you explain this?

Thanks

5
  • 4
    a[0,:] only sums the first row. a.sum(axis=0) sums the entire thing along the row. You're not computing the same thing at all. Commented Jan 29, 2018 at 1:52
  • "However, if use the numpy built-in function, the result is different." Actually, you use "numpy built-in function" always! @COLDSPEED is exactly correct: you are computing different things. Commented Jan 29, 2018 at 9:08
  • a.sum (axis = 0) is the result calculated according to the column. The result of "print(a[0,:].sum(), a[1,:].sum(), a[2,:].sum())" computed along row is [3 12 21], whereas the result of a.sum (axis = 0) is [9, 12, 15]. Commented Jan 29, 2018 at 11:34
  • @신대규 Yeah, if you want the same output use axis=1, not 0. Also, the reason one is slower than the other is because of the cache misses and lack of locality of reference. Commented Jan 29, 2018 at 12:34
  • @cᴏʟᴅsᴘᴇᴇᴅ Thank you for your reply. It is very confused to me. I have also tested it on a Linux system. a = np.random.rand(5000, 5000) %timeit a.sum(axis=0) 100 loops, best of 3: 17 ms per loop %timeit a.sum(axis=1) 100 loops, best of 3: 15.5 ms per loop This is not a big difference, but it makes sense. But on window system. %timeit a.sum(axis=0) 17 ms ± 122 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit a.sum(axis=1) 29.7 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 10 loops each). It seems to be affected by the operating system or numpy version Commented Jan 29, 2018 at 13:00

1 Answer 1

2

You don't compute the same thing.

The first two commands only compute one row/column out of the entire array.

a[0, :].sum().shape   # sums just the first row only
()

The second two commands, sum the entire contents of the 2D array, but along a certain axis. That way, you don't get a single result (as in the first two commands), but an 1D array of sums.

a.sum(axis=0).shape   # computes the row-wise sum for each column
(5000,)

In summary, the two sets of commands do different things.


a
array([[1, 6, 9, 1, 6],
       [5, 6, 9, 1, 3],
       [5, 0, 3, 5, 7],
       [2, 8, 3, 8, 6],
       [3, 4, 8, 5, 0]])

a[0, :]
array([1, 6, 9, 1, 6])

a[0, :].sum()
23

a.sum(axis=0)
array([16, 24, 32, 20, 22])
Sign up to request clarification or add additional context in comments.

1 Comment

Just also of note--if you do c = np.random.randn(5000, 5000) and f = np.asfortranarray(c), you do see non-negligible inverted timing differences for sums on the 0 and 1 axis.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.