how to delete duplicate values in a cell array of characters

Question

I have this cell array of chars:

a={'1';'1';'1';'1';'1';'3';'3';'3';'3';'3';'3';'4';'4';'4';'4'};

and I want to transform it into this:

a={'1';'';'';'';'';'3';'';'';'';'';'';'4';'';'';''};

Can you explain why? What are you planning on doing with the result? It seems fairly unnecessary and non-trivial — Dan
– Dan, Commented Feb 9, 2016 at 14:32
Are they always going to be numeric? Does the result have to be a cell array of characters? — sco1
– sco1, Commented Feb 9, 2016 at 14:40
Are duplicates always going to be grouped together? And, if you have a = {'1','1','2','2','1','1'} do you delete three "1"s or just the followers in each group? — Carl Witthoft
– Carl Witthoft, Commented Feb 9, 2016 at 18:56

zeeMonkeez · Accepted Answer · 2016-02-09 15:21:01Z

10

First, find the unique elements of a and their first indices. Then set all other entries of a to ''.

[~, ii] = unique(a);
ind = setdiff(1:numel(a), ii);
[a{ind}] = deal('');

As pointed out by CST-Link, both the computation of duplicate indices and assignment of empty strings can be sped up (in particular, setdiff is slow):

[~, ii] = unique(a);
ind = 1:numel(a);
ind(ii) = [];
a(ind) = {''};

edited Feb 9, 2016 at 15:21

answered Feb 9, 2016 at 14:48

zeeMonkeez

5,1973 gold badges36 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user2271770 Over a year ago

You could go faster with [~, ii] = unique(a); ind = 1:numel(a); ind(ii) = []; a(ind) = {''};

Carl Witthoft Over a year ago

This is close to what rle would do, interestingly (or not :-) )

score 5 · Accepted Answer · 2016-02-09 15:32:00Z

5

This could be fast for large arrays:

a=repmat({'1';'1';'1';'1';'1';'3';'3';'3';'3';'3';'3';'4';'4';'4';'4'}, 100000, 1);

[u,n] = unique(flipud(a));
b = repmat({''}, size(a));
b(n) = u;
a = flipud(b);

edited Feb 9, 2016 at 15:32

answered Feb 9, 2016 at 14:53

user2271770

5 Comments

Andras Deak -- Слава Україні Over a year ago

repmat is typically known for its lack of speed. Do you have any evidence that this pays off for large arrays?:)

user2271770 Over a year ago

@AndrasDeak repmat is O(1) though it has a large overhead, while unique is O(n*log(n)). I changed the code for a large array a so you can profile it yourself. Try smaller or larger values for the number of duplicates in the definition of a; please let me know if you got different complexities.

user2271770 Over a year ago

@AndrasDeak Just checked the code that I proposed against the original one that was chosen as answer. For 100000 (one hundred thousands) copies of the original a my code ran in 0.9 seconds, while the other ran in 1.4 seconds (MATLAB R2012a, i5, 3Gb RAM). I hope that this is evidence enough. :-)

Andras Deak -- Слава Україні Over a year ago

Of course it's enough, thank you:) I was only curious, and didn't know which one would come out on top.

zeeMonkeez Over a year ago

Indeed, for large arrays (tested with 3e5 elements) this solution is faster than my originally proposed one. For smaller ones (tested with 30 elements), it is the slowest. Your improvement to my originally proposed solution is fastest, in all tested cases. Interestingly, for large arrays, deal is the bottleneck...

Collectives™ on Stack Overflow

how to delete duplicate values in a cell array of characters

2 Answers 2

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related