3

I have a question about sum in matlab.

For a vector (1xN matrices), sum seems to be parallelised. For example,

a=rand(1,100000000);

maxNumCompThreads(2);
tic;for ii=1:20;b=sum(a,2);end;toc

maxNumCompThreads(1);
tic;for ii=1:20;b=sum(a,2);end;toc

> Elapsed time is 1.219342 seconds.
> Elapsed time is 2.393047 seconds.

But if instead I consider a 2xN matrix,

a=rand(2,100000000);

maxNumCompThreads(2); 
tic;for ii=1:20;b=sum(a,2);end;toc

maxNumCompThreads(1); 
tic;for ii=1:20;b=sum(a,2);end;toc

> Elapsed time is 7.614303 seconds.
> Elapsed time is 7.432590 seconds.

In this case, sum doesn't seem to benefit from the extra core.

Anyone came across this before? I'm wondering if this could be due to indexing overhead and whether it is possible to make sum faster in the case of 2xN matrices.

Thanks a lot.

2
  • I don't see this behavior (matlab 2012a). the ratio for the 1D array is ~10% improvement for the 2 thread vs single thread. I get that the first loop (2 threads) for the 2D array is also slightly faster (2%) than the second one (single thread). Though it is not as efficient as a 1-D array it is not what you report. Commented Jul 13, 2013 at 5:17
  • @natan Thanks for your comment. I have check this on a couple of machines (2 and 4 cores) with matlab 2011b and 2013a and I always observe this behaviour. Commented Jul 13, 2013 at 22:00

1 Answer 1

2

This is something MATLAB is not very clear about. Anytime you create an array, MATLAB generates a row vector but behind the scene it actually prefers column vectors. So, summing an array in rows (1st dimension) would faster than in rows (2nd dimension). For your case, if you converted a into a row-major representation and performed the sum in the 1st dimension, the benefit would be seen. On my machine, I get the following

a = rand(100000000, 2);
maxNumCompThreads(2); 
tic; for ii=1:20; b=sum(a,1); end; toc
maxNumCompThreads(1);
tic; for ii=1:20; b=sum(a,1); end; toc

> Elapsed time is 2.485628 seconds.
> Elapsed time is 4.381082 seconds.
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot. I observed the same behaviour. But the thing is if a 2XN matrix is given, e.g., a=rand(2,N), then sum(a,2) is much faster than sum(a.'). Even though the latter clearly benefits from parallel computation, the transpose becomes the bottle neck...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.