0

I have a MATLAB double array that looks like this:

YEAR    QUARTER ID  VAR
2000    1       1   50
2000    1       2   20
2000    1       3   67
2000    2       1   43

It goes on for many years and many quarters, and the number of rows in each quarter and year varies unpredictably. The variables constitute estimates by individual people.

Another double array that looks like this:

YEAR    QUARTER OUTCOME
2000    1       100
2000    2       0

It goes on for many years and many quarters. There is only one outcome in each quarter. I want to subtract the person's estimate from the outcome and place the result in the initial array.

The result should look like this this:

YEAR    QUARTER ID  VAR   RESULT
2000    1       1   50    50
2000    1       2   20    80
2000    1       3   67    33
2000    2       1   43    43

What's the best way to achieve this?

1 Answer 1

1

Here are three options, depending on desired speed / readability / assumptions.

%% Load data
estimate = [...
  2000    1       1   50; ...
  2000    1       2   20; ...
  2000    1       3   67; ...
  2000    2       1   43; ...
  2000    4       1   50];
outcome = [...
  2000    1       100; ...
  2000    2       0; ...
  2000    4       0; ...
  2001    1       10];
n_estimate = size(estimate,1);
n_outcome = size(outcome,1);

%% Loop version (easier to read, more flexible)

result = zeros(n_estimate,1);
for i = 1:n_estimate
  % Find matching year & quarter for this estimate
  j = all(bsxfun(@eq, outcome(:,1:2), estimate(i,1:2)),2);
  % Subtract estimate from outcome (seems like you want the absolute value)
  result(i) = abs(outcome(j,3) - estimate(i,4));
end

% Append the result to the estimate matrix, and display
estimated_result = [estimate result];
display(estimated_result);

%% Vectorized version (more efficient, forced assumptions)
% Note: this assumes that you have outcomes for every quarter
% (i.e. there are none missing), so we can just calculate an offset from
% the start year/quarter
% The second-last outcome violates this assumption,
% causing the last estimate to be incorrect for this version

% Build an integer index from the combined year/quarter, offset from
% the first year/quarter that is available in the outcome list
begin = outcome(1,1)*4 + outcome(1,2);
j = estimate(:,1)*4 + estimate(:,2) - begin + 1;

% Subtract estimate from outcome (seems like you want the absolute value)
result = abs(outcome(j,3) - estimate(:,4));

% Append the result to the estimate matrix, and display
estimated_result = [estimate result];
display(estimated_result);

%% Vectorize version 2 (more efficient, hardest to read)
% Note: this does not assume that you have data for every quarter

% Build an inverted index to map year*4+quarter-begin to an outcome index.
begin = outcome(1,1)*4 + outcome(1,2);
i = outcome(:,1)*4+outcome(:,2)-begin+1; % outcome indices
j_inv(i) = 1:n_outcome;

% Build the forward index from estimate into outcome
j = j_inv(estimate(:,1)*4 + estimate(:,2) - begin + 1);

% Subtract estimate from outcome (seems like you want the absolute value)
result = abs(outcome(j,3) - estimate(:,4));

% Append the result to the estimate matrix, and display
estimated_result = [estimate result];
display(estimated_result);

output:

estimated_result =

    2000           1           1          50          50
    2000           1           2          20          80
    2000           1           3          67          33
    2000           2           1          43          43
    2000           4           1          50          50

estimated_result =

    2000           1           1          50          50
    2000           1           2          20          80
    2000           1           3          67          33
    2000           2           1          43          43
    2000           4           1          50          40

estimated_result =

    2000           1           1          50          50
    2000           1           2          20          80
    2000           1           3          67          33
    2000           2           1          43          43
    2000           4           1          50          50
Sign up to request clarification or add additional context in comments.

3 Comments

This works excellently! The j = all(bsxfun(@eq, outcome(:,1:2), estimate(i,1:2)),2); part will produce a vector that is [1 0 0 0 ....]. What if I wanted to take that and make it [0 1 0 0 0 ...]? The reason why is that in some later data sets the outcome for QTR1 2000 will be next to QTR2 2000 in the other array, so everything will need to move down a line.
That j is a mask, you can convert it to an index using j = find(j, 1, 'first'), and then just add 1 to it. In the vectorized versions j is a vector of indices, so you can just add 1 to them as-is.
P.S. If this answer solved your problem, you can accept the solution by clicking the checkmark. ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.