2

I have this cell array of chars:

a={'1';'1';'1';'1';'1';'3';'3';'3';'3';'3';'3';'4';'4';'4';'4'};

and I want to transform it into this:

a={'1';'';'';'';'';'3';'';'';'';'';'';'4';'';'';''};
3
  • 4
    Can you explain why? What are you planning on doing with the result? It seems fairly unnecessary and non-trivial Commented Feb 9, 2016 at 14:32
  • 1
    Are they always going to be numeric? Does the result have to be a cell array of characters? Commented Feb 9, 2016 at 14:40
  • Are duplicates always going to be grouped together? And, if you have a = {'1','1','2','2','1','1'} do you delete three "1"s or just the followers in each group? Commented Feb 9, 2016 at 18:56

2 Answers 2

10

First, find the unique elements of a and their first indices. Then set all other entries of a to ''.

[~, ii] = unique(a);
ind = setdiff(1:numel(a), ii);
[a{ind}] = deal('');

As pointed out by CST-Link, both the computation of duplicate indices and assignment of empty strings can be sped up (in particular, setdiff is slow):

[~, ii] = unique(a);
ind = 1:numel(a);
ind(ii) = [];
a(ind) = {''};
Sign up to request clarification or add additional context in comments.

2 Comments

You could go faster with [~, ii] = unique(a); ind = 1:numel(a); ind(ii) = []; a(ind) = {''};
This is close to what rle would do, interestingly (or not :-) )
5

This could be fast for large arrays:

a=repmat({'1';'1';'1';'1';'1';'3';'3';'3';'3';'3';'3';'4';'4';'4';'4'}, 100000, 1);

[u,n] = unique(flipud(a));
b = repmat({''}, size(a));
b(n) = u;
a = flipud(b);

5 Comments

repmat is typically known for its lack of speed. Do you have any evidence that this pays off for large arrays?:)
@AndrasDeak repmat is O(1) though it has a large overhead, while unique is O(n*log(n)). I changed the code for a large array a so you can profile it yourself. Try smaller or larger values for the number of duplicates in the definition of a; please let me know if you got different complexities.
@AndrasDeak Just checked the code that I proposed against the original one that was chosen as answer. For 100000 (one hundred thousands) copies of the original a my code ran in 0.9 seconds, while the other ran in 1.4 seconds (MATLAB R2012a, i5, 3Gb RAM). I hope that this is evidence enough. :-)
Of course it's enough, thank you:) I was only curious, and didn't know which one would come out on top.
Indeed, for large arrays (tested with 3e5 elements) this solution is faster than my originally proposed one. For smaller ones (tested with 30 elements), it is the slowest. Your improvement to my originally proposed solution is fastest, in all tested cases. Interestingly, for large arrays, deal is the bottleneck...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.