MATLAB - Find and number duplicates within an array

Question

I have an array of values, some of which have duplicates, for example:

a = [5;5;4;7;7;3;3;9;5;7]

and I would like to find which are duplicates, and then number each of these sequentially, while making non-duplicates zero. For example:

b = [1;1;0;2;2;3;3;0;1;2]

Currently I have a very inefficient and incomplete approach, using the unique function and various for loops and if statements, but feel that there should be a simple answer.

What is the most efficient way to get to this answer?

Is an input like [5;5;4;7;7;3;3;9;5;5;4;7] possible? What would the result be? — Luis Mendo
– Luis Mendo, Commented Jun 15, 2017 at 16:45
@LuisMendo Yes, that input is also possible. I've modified the question to include non-consecutive duplicates. — user3743235
– user3743235, Commented Jun 16, 2017 at 8:12

Luis Mendo · Accepted Answer · 2017-06-16 09:09:36Z

3

Here's another approach:

a = [5;5;4;7;7;3;3;9;5;7];
[u, ~, w] = unique(a, 'stable');
s = find(sum(bsxfun(@eq, a, u.'), 1) > 1);
b = sum(bsxfun(@times, bsxfun(@eq, w, s), 1:numel(s)), 2);

In R2016b onwards you can simplify the syntax:

a = [5;5;4;7;7;3;3;9;5;7];
[u, ~, w] = unique(a, 'stable');
s = find(sum(a==u.', 1) > 1);
b = sum((w==s).*(1:numel(s)), 2);

answered Jun 16, 2017 at 9:09

Luis Mendo

113k13 gold badges80 silver badges154 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

sco1 · Accepted Answer · 2017-06-15 16:56:44Z

You can use a combination of unique, accumarray, and ismember to make the necessary adjustments:

a = [5;5;4;7;7;3;3;9];

% Identify unique values and their counts
[uniquevals, ~, ia] = unique(a, 'stable');  % Stable keeps it in the same order
bincounts = accumarray(ia, 1);  % Count the frequency of each index in ia

% Zero out singles
singles = uniquevals(bincounts <= 1);
[~, singleidx] = intersect(a, singles);
a(singleidx) = 0;

% Overwrite repeats
repeats = uniquevals(bincounts > 1);
[~, a] = ismember(a, repeats);

Which returns a new a of:

a =

     1     1     0     2     2     3     3     0

Walkthrough

We use unique here to find all of the unique values in our input array, a. We also store the optional third output, which is a mapping of the values of a to their index in the array of unique values. Note that we're using the stable option to obtain the unique values in the order they're first encountered in a; the results of unique are sorted by default.

We then use accumarray to accumulate the subscripts we got from unique, which gives us a count of each index. Using logical indexing, we use these counts first to zero out the single instances. After these are zeroed out, we can ~~abuse~~ use the second output of ismember to return the final answer.

rahnema1 · Accepted Answer · 2017-06-16 20:53:50Z

2

Here is a solution based on indexing, logical operators and cumsum:

x = [false; a(2:end)==a(1:end-1)]; %logical indexes of repeated elements except the first element of each block 
y = [x(2:end)|x(1:end-1) ;x(end)]; %logical indexes of repeated elements
result = cumsum(~x&y).*y           %cumsum(...):number all elements sequentially and (... .* y): making non-duplicates zero

Edit:

As the question edited, to manipulate non-consecutive duplicates you can do this:

[s ii] = sort(a);
x = [false ;s(2:end)==s(1:end-1)];
y = [x(2:end)|x(1:end-1) ;x(end)];
first = ~x&y;
[~,ix]=sort(ii(first));
un(ix,1)=1:numel(ix);
result(ii,1)=un(cumsum(first)).*y;

edited Jun 16, 2017 at 20:53

answered Jun 15, 2017 at 17:17

rahnema1

15.9k3 gold badges17 silver badges28 bronze badges

4 Comments

Y. Chang Over a year ago

I like this effective approach. This finds only consecutive duplicates though.

rahnema1 Over a year ago

@Y.Chang Thanks! It seems that OP wants consecutive duplicates except that I receive a new feedback.

Leander Moesinger Over a year ago

Can't say where exactly the problem is, but your second approach bugs if more than 2 same elements exist. E.g. for a = [5;5;5;4;7;7;3;3;3;3;9]; . Still very neat tough.

rahnema1 Over a year ago

@LeanderMoesinger Thanks, you are right, the second approach removed.

Vahe Tshitoyan · Accepted Answer · 2017-06-15 19:53:40Z

1

Here is a two liner that will also work for non consecutive duplicates

[c, ia, ic] = unique(a, 'stable');
[~, b] = ismember(a, a(ia(accumarray(ic,1)>1)));

I have used some ideas from @excaza answer with modifications.

answered Jun 15, 2017 at 19:53

Vahe Tshitoyan

1,4291 gold badge12 silver badges21 bronze badges

Collectives™ on Stack Overflow

MATLAB - Find and number duplicates within an array

4 Answers 4

Comments

Walkthrough

Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Walkthrough

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related