MATLAB removing rows which has duplicates in sequence

Question

I'm trying to remove the rows which has duplicates in sequence. I have only 2 possible values which are 0 and 1. I have nXm which n shows possible number of bits and m is not important for my question. My goal is to find an matrix which is nX(m-a). The rows a which has the property which includes duplicates in sequence. For example:

My matrix is :

A=[0 1 0 1 0 1;
   0 0 0 1 1 1;
   0 0 1 0 0 1;
   0 1 0 0 1 0;
   1 0 0 0 1 0]

I want to remove the rows has t duplicates in sequence for 0. In this question let's assume t is 3. So I want the matrix which:

B=[0 1 0 1 0 1;
   0 0 1 0 0 1; 
   0 1 0 0 1 0]

2nd and 5th rows are removed.

I probably need to use diff.

Luis Mendo · Accepted Answer · 2015-06-17 13:44:19Z

6

So you want to remove rows of A that contain at least t zeros in sequence.

How about a single line?

B = A(~any(conv2(1,ones(1,t),2*A-1,'valid')==-t, 2),:);

How this works:

Transform A to bipolar form (2*A-1)
Convolve each row with a sequence of t ones (conv2(...))
Keep only rows for which the convolution does not contain -t (~any(...)). The presence of -t indicates a sequence of t zeros in the corresponding row of A.

To remove rows that contain at least t ones, just change -t to t:

B = A(~any(conv2(1,ones(1,t),2*A-1,'valid')==t, 2),:);

edited Jun 17, 2015 at 13:44

answered Jun 17, 2015 at 11:55

Luis Mendo

113k13 gold badges80 silver badges154 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Toygan Over a year ago

It is certain that your answer is very clever :) But I did not understand it. How can it for 1's?

Luis Mendo Over a year ago

@toygankılıç Sorry, what do you mean "How can it for 1's"?

Toygan Over a year ago

My question is only for duplicate of 0. How it can be written for 1's.

Santhan Salai Over a year ago

Very clever use of conv2 here +1. I used your logic to generalize this for any consecutive numbers instead of 0 with im2col

Santhan Salai · Accepted Answer · 2015-06-17 16:02:46Z

3

Here is a generalized approach which removes any rows which has given number of consecutive duplicates (not just zero. could be any number).

t = 3;
row_mask = ~any(all(~diff(reshape(im2col(A,[1 t],'sliding'),t,size(A,1),[]))),3);
out = A(row_mask,:)

Sample Run:

>> A

A =

 0     1     0     1     0     1
 0     0     1     5     5     5    %// consecutive 3 5's
 0     0     1     0     0     1
 0     1     0     0     1     0
 1     1     1     0     0     1    %// consecutive 3 1's

>> out

out =

 0     1     0     1     0     1
 0     0     1     0     0     1
 0     1     0     0     1     0

answered Jun 17, 2015 at 16:02

Santhan Salai

3,89821 silver badges29 bronze badges

2 Comments

Luis Mendo Over a year ago

Nice use of im2col!

Santhan Salai Over a year ago

@LuisMendo Thanks I wouldn't have thought of it if you haven't used conv2 :P They kind of look similar to me with im2col giving intermediate results :D

rayryeng · Accepted Answer · 2015-06-17 16:17:43Z

3

How about an approach using strings? This is certainly not as fast as Luis Mendo's method where you work directly with the numerical array, but it's thinking a bit outside of the box. The basis of this approach is that I consider each row of A to be a unique string, and I can search each string for occurrences of a string of 0s by regular expressions.

A=[0 1 0 1 0 1;
   0 0 0 1 1 1;
   0 0 1 0 0 1;
   0 1 0 0 1 0;
   1 0 0 0 1 0];
t = 3;

B = sprintfc('%s', char('0' + A));   
ind = cellfun('isempty', regexp(B, repmat('0', [1 t])));
B(~ind) = [];
B = double(char(B) - '0');

We get:

B =
     0     1     0     1     0     1
     0     0     1     0     0     1
     0     1     0     0     1     0

Explanation

Line 1: Convert each line of the matrix A into a string consisting of 0s and 1s. Each line becomes a cell in a cell array. This uses the undocumented function sprintfc to facilitate this cell array conversion.
Line 2: I use regular expressions to find any occurrences of a string of 0s that is t long. I first use repmat to create a search string that is full of 0s and is t long. After, I determine if each line in this cell array contains this sequence of characters (i.e. 000....). The function regexp helps us perform regular expressions and returns the locations of any matches for each cell in the cell array. Alternatively, you can use the function strfind for more recent versions of MATLAB to speed up the computation, but I chose regexp so that the solution is compatible with most MATLAB distributions out there.

Continuing on, the output of regexp/strfind is a cell array of elements where each cell reports the locations of where we found the particular string. If we have a match, there should be at least one location that is reported at the output, so I check to see if any matches are empty, meaning that these are the rows we don't want to remove. I want to turn this into a logical array for the purposes of removing rows from A, and so this is wrapped with a cellfun call to determine the cells that are empty. Therefore, this line returns a logical array where a 0 means that remove this row and a 1 means that we don't.
Line 3: I take the logical array from Line 2 and invert it because that's what we really want. We use this inverted array to index into the cell array and remove those strings.
Line 4: The output is still a cell array, so I convert it back into a character array, and finally back into a numerical array.

edited Jun 17, 2015 at 16:17

answered Jun 17, 2015 at 13:28

rayryeng

105k22 gold badges200 silver badges205 bronze badges

6 Comments

Luis Mendo Over a year ago

Nice approach! You could also use ind = cellfun('isempty', strfind(B, repmat('0', [1 t])))

rayryeng Over a year ago

@LuisMendo - Cool! I always forget about that. Thanks!

Luis Mendo Over a year ago

Hehehe. My point was, regular expressions have a reputation for being slow (I haven't tested that). Maybe strfind could speed things up

rayryeng Over a year ago

@LuisMendo some versions of MATLAB don't have strfind which is why I opted for regular expressions. I actually used strfind at first!

Luis Mendo Over a year ago

You deserved the vote, it's a great answer! Except for going undocumented, that is :-P

|

Collectives™ on Stack Overflow

MATLAB removing rows which has duplicates in sequence

3 Answers 3

4 Comments

2 Comments

Explanation

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

2 Comments

Explanation

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related