4

I have a matrix with percantage values where every row represents an individual observation. I need to compute the cumulative product where these values correspond to the same subscript. I tried to use the accumarray function, which works fine and as expected as long I use a column vector as values (rather than a matrix). I am wondering what is the best way to solve my problem without looping through the individual columns of my value matrix?

Here's my sample code:

subs = [1;1;1;2;2;2;2;2;3;3;4;4;4];
vals1 = [0.1;0.05;0.2;0.02;0.09;0.3;0.01;0.21;0.12;0.06;0.08;0.12;0.05];

% This is working as expected
result1 = accumarray(subs,vals1, [], @(x) prod(1+x) -1)


vals2 = [vals1,vals1];

% This is not working as the second input parameter of accumarray
% apperently must be a vector (rather than a matrix)
result2 = accumarray(subs, vals2, [], @(x) prod(1+x) -1)

2 Answers 2

2

For vals you can set it as 1:size(vals2,1) and use it to extract rows of vals2. Also it is required for the function to return cell.

result2 = accumarray(subs, 1:size(vals2,1), [], @(x) {prod(1+vals2(x,:),1)-1})

You can concatenate cell elements:

result3 = vertcat(result2{:})

Or all in one line:

result3 = cell2mat( accumarray(subs, 1:size(vals2,1), [], @(x) {prod(1+vals2(x,:),1)-1}))

result3 =

   0.38600   0.38600
   0.76635   0.76635
   0.18720   0.18720
   0.27008   0.27008

Result of a test in Octave comparing three proposed methods using a [10000 x 200] matrix as input:

subs = randi(1000,10000,1);
vals2 = rand(10000,200);

=========CELL2MAT========
Elapsed time is 0.130961 seconds.
=========NDGRID========
Elapsed time is 3.96383 seconds.
=========FOR LOOP========
Elapsed time is 6.16265 seconds.

Online Demo

Sign up to request clarification or add additional context in comments.

6 Comments

That doesn't look like an easy solution. It's working but in imho it makes the accumarray function less readable. In that case, I think I prefer a simple for-loop solution. <br/> for i = 1 : size(vals2, 2) result2(:,i) = accumarray(subs, vals2(:,i), [], @(x) prod(1+x) -1); end
for-loop solution may be inefficient when number of columns is high.
Hmm, in my case, number of columns can be as high as 10,000.
@rahnema1 I think the for loop solution posted by OP is not efficient. I implemented an improved for loop without accumarray which is performing as fast as the cell2mat. I modified your script and added the improved for loop solution. Run it a few times and you will see the perf is similar to cell2mat. rextester.com/RLYKA32361 . Honestly, I am a bit surprised that the simple for loop is matching accumarray. Let me know if I am missing something.
@Turbo Here I edited the code and used a random data set for subs. There are 1500 unique categories each repeated at most 500 times. I commented out ndgrid because it requires much of memory. The improved loop is better than non improved loop but it cannot outperform cell2mat. It would be better to use repelem to create the data-set but that version of Octave doesn't contain repelem.
|
0

You need to add a second set of subscripts to subs (so that it is N-by-2) to handle your 2D data, which still has to be passed as an N-element vector (i.e. one element for each row in subs). You can generate the new set of 2D subscripts using ndgrid:

[subs1, subs2] = ndgrid(subs, 1:size(vals2, 2));
result2 = accumarray([subs1(:) subs2(:)], vals2(:), [], @(x) prod(1+x) -1)

And the result with your sample data:

result2 =

    0.3860    0.3860
    0.7664    0.7664
    0.1872    0.1872
    0.2701    0.2701

2 Comments

How would you define the output arguments of ngrid if the number of columns in vals2 is variable?
@Andi: That's exactly what I do in the above code. Note that the second argument to ndgrid is a vector from 1 to size(vals2, 2) (i.e. the number of columns in vals2).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.