3

I have an array of values, some of which have duplicates, for example:

a = [5;5;4;7;7;3;3;9;5;7]

and I would like to find which are duplicates, and then number each of these sequentially, while making non-duplicates zero. For example:

b = [1;1;0;2;2;3;3;0;1;2]

Currently I have a very inefficient and incomplete approach, using the unique function and various for loops and if statements, but feel that there should be a simple answer.

What is the most efficient way to get to this answer?

2
  • 4
    Is an input like [5;5;4;7;7;3;3;9;5;5;4;7] possible? What would the result be? Commented Jun 15, 2017 at 16:45
  • @LuisMendo Yes, that input is also possible. I've modified the question to include non-consecutive duplicates. Commented Jun 16, 2017 at 8:12

4 Answers 4

3

Here's another approach:

a = [5;5;4;7;7;3;3;9;5;7];
[u, ~, w] = unique(a, 'stable');
s = find(sum(bsxfun(@eq, a, u.'), 1) > 1);
b = sum(bsxfun(@times, bsxfun(@eq, w, s), 1:numel(s)), 2);

In R2016b onwards you can simplify the syntax:

a = [5;5;4;7;7;3;3;9;5;7];
[u, ~, w] = unique(a, 'stable');
s = find(sum(a==u.', 1) > 1);
b = sum((w==s).*(1:numel(s)), 2);
Sign up to request clarification or add additional context in comments.

Comments

2

You can use a combination of unique, accumarray, and ismember to make the necessary adjustments:

a = [5;5;4;7;7;3;3;9];

% Identify unique values and their counts
[uniquevals, ~, ia] = unique(a, 'stable');  % Stable keeps it in the same order
bincounts = accumarray(ia, 1);  % Count the frequency of each index in ia

% Zero out singles
singles = uniquevals(bincounts <= 1);
[~, singleidx] = intersect(a, singles);
a(singleidx) = 0;

% Overwrite repeats
repeats = uniquevals(bincounts > 1);
[~, a] = ismember(a, repeats);

Which returns a new a of:

a =

     1     1     0     2     2     3     3     0

Walkthrough

We use unique here to find all of the unique values in our input array, a. We also store the optional third output, which is a mapping of the values of a to their index in the array of unique values. Note that we're using the stable option to obtain the unique values in the order they're first encountered in a; the results of unique are sorted by default.

We then use accumarray to accumulate the subscripts we got from unique, which gives us a count of each index. Using logical indexing, we use these counts first to zero out the single instances. After these are zeroed out, we can abuse use the second output of ismember to return the final answer.

Comments

2

Here is a solution based on indexing, logical operators and cumsum:

x = [false; a(2:end)==a(1:end-1)]; %logical indexes of repeated elements except the first element of each block 
y = [x(2:end)|x(1:end-1) ;x(end)]; %logical indexes of repeated elements
result = cumsum(~x&y).*y           %cumsum(...):number all elements sequentially and (... .* y): making non-duplicates zero

Edit:

As the question edited, to manipulate non-consecutive duplicates you can do this:

[s ii] = sort(a);
x = [false ;s(2:end)==s(1:end-1)];
y = [x(2:end)|x(1:end-1) ;x(end)];
first = ~x&y;
[~,ix]=sort(ii(first));
un(ix,1)=1:numel(ix);
result(ii,1)=un(cumsum(first)).*y;

4 Comments

I like this effective approach. This finds only consecutive duplicates though.
@Y.Chang Thanks! It seems that OP wants consecutive duplicates except that I receive a new feedback.
Can't say where exactly the problem is, but your second approach bugs if more than 2 same elements exist. E.g. for a = [5;5;5;4;7;7;3;3;3;3;9]; . Still very neat tough.
@LeanderMoesinger Thanks, you are right, the second approach removed.
1

Here is a two liner that will also work for non consecutive duplicates

[c, ia, ic] = unique(a, 'stable');
[~, b] = ismember(a, a(ia(accumarray(ic,1)>1)));

I have used some ideas from @excaza answer with modifications.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.