Calculating a new MATLAB array column based on looking up values from two arrays

Question

I have a MATLAB double array that looks like this:

YEAR    QUARTER ID  VAR
2000    1       1   50
2000    1       2   20
2000    1       3   67
2000    2       1   43

It goes on for many years and many quarters, and the number of rows in each quarter and year varies unpredictably. The variables constitute estimates by individual people.

Another double array that looks like this:

YEAR    QUARTER OUTCOME
2000    1       100
2000    2       0

It goes on for many years and many quarters. There is only one outcome in each quarter. I want to subtract the person's estimate from the outcome and place the result in the initial array.

The result should look like this this:

YEAR    QUARTER ID  VAR   RESULT
2000    1       1   50    50
2000    1       2   20    80
2000    1       3   67    33
2000    2       1   43    43

What's the best way to achieve this?

kmac · Accepted Answer · 2015-08-25 05:01:06Z

1

Here are three options, depending on desired speed / readability / assumptions.

%% Load data
estimate = [...
  2000    1       1   50; ...
  2000    1       2   20; ...
  2000    1       3   67; ...
  2000    2       1   43; ...
  2000    4       1   50];
outcome = [...
  2000    1       100; ...
  2000    2       0; ...
  2000    4       0; ...
  2001    1       10];
n_estimate = size(estimate,1);
n_outcome = size(outcome,1);

%% Loop version (easier to read, more flexible)

result = zeros(n_estimate,1);
for i = 1:n_estimate
  % Find matching year & quarter for this estimate
  j = all(bsxfun(@eq, outcome(:,1:2), estimate(i,1:2)),2);
  % Subtract estimate from outcome (seems like you want the absolute value)
  result(i) = abs(outcome(j,3) - estimate(i,4));
end

% Append the result to the estimate matrix, and display
estimated_result = [estimate result];
display(estimated_result);

%% Vectorized version (more efficient, forced assumptions)
% Note: this assumes that you have outcomes for every quarter
% (i.e. there are none missing), so we can just calculate an offset from
% the start year/quarter
% The second-last outcome violates this assumption,
% causing the last estimate to be incorrect for this version

% Build an integer index from the combined year/quarter, offset from
% the first year/quarter that is available in the outcome list
begin = outcome(1,1)*4 + outcome(1,2);
j = estimate(:,1)*4 + estimate(:,2) - begin + 1;

% Subtract estimate from outcome (seems like you want the absolute value)
result = abs(outcome(j,3) - estimate(:,4));

% Append the result to the estimate matrix, and display
estimated_result = [estimate result];
display(estimated_result);

%% Vectorize version 2 (more efficient, hardest to read)
% Note: this does not assume that you have data for every quarter

% Build an inverted index to map year*4+quarter-begin to an outcome index.
begin = outcome(1,1)*4 + outcome(1,2);
i = outcome(:,1)*4+outcome(:,2)-begin+1; % outcome indices
j_inv(i) = 1:n_outcome;

% Build the forward index from estimate into outcome
j = j_inv(estimate(:,1)*4 + estimate(:,2) - begin + 1);

% Subtract estimate from outcome (seems like you want the absolute value)
result = abs(outcome(j,3) - estimate(:,4));

% Append the result to the estimate matrix, and display
estimated_result = [estimate result];
display(estimated_result);

output:

estimated_result =

    2000           1           1          50          50
    2000           1           2          20          80
    2000           1           3          67          33
    2000           2           1          43          43
    2000           4           1          50          50

estimated_result =

    2000           1           1          50          50
    2000           1           2          20          80
    2000           1           3          67          33
    2000           2           1          43          43
    2000           4           1          50          40

estimated_result =

    2000           1           1          50          50
    2000           1           2          20          80
    2000           1           3          67          33
    2000           2           1          43          43
    2000           4           1          50          50

edited Aug 25, 2015 at 5:01

answered Aug 25, 2015 at 4:44

kmac

6984 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user1205901 - Слава Україні Over a year ago

This works excellently! The j = all(bsxfun(@eq, outcome(:,1:2), estimate(i,1:2)),2); part will produce a vector that is [1 0 0 0 ....]. What if I wanted to take that and make it [0 1 0 0 0 ...]? The reason why is that in some later data sets the outcome for QTR1 2000 will be next to QTR2 2000 in the other array, so everything will need to move down a line.

kmac Over a year ago

That j is a mask, you can convert it to an index using j = find(j, 1, 'first'), and then just add 1 to it. In the vectorized versions j is a vector of indices, so you can just add 1 to them as-is.

kmac Over a year ago

P.S. If this answer solved your problem, you can accept the solution by clicking the checkmark. ;)

Collectives™ on Stack Overflow

Calculating a new MATLAB array column based on looking up values from two arrays

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related