1

I'm trying to remove the rows which has duplicates in sequence. I have only 2 possible values which are 0 and 1. I have nXm which n shows possible number of bits and m is not important for my question. My goal is to find an matrix which is nX(m-a). The rows a which has the property which includes duplicates in sequence. For example:

My matrix is :

A=[0 1 0 1 0 1;
   0 0 0 1 1 1;
   0 0 1 0 0 1;
   0 1 0 0 1 0;
   1 0 0 0 1 0]

I want to remove the rows has t duplicates in sequence for 0. In this question let's assume t is 3. So I want the matrix which:

B=[0 1 0 1 0 1;
   0 0 1 0 0 1; 
   0 1 0 0 1 0]

2nd and 5th rows are removed.

I probably need to use diff.

3 Answers 3

6

So you want to remove rows of A that contain at least t zeros in sequence.

How about a single line?

B = A(~any(conv2(1,ones(1,t),2*A-1,'valid')==-t, 2),:);

How this works:

  1. Transform A to bipolar form (2*A-1)
  2. Convolve each row with a sequence of t ones (conv2(...))
  3. Keep only rows for which the convolution does not contain -t (~any(...)). The presence of -t indicates a sequence of t zeros in the corresponding row of A.

To remove rows that contain at least t ones, just change -t to t:

B = A(~any(conv2(1,ones(1,t),2*A-1,'valid')==t, 2),:);
Sign up to request clarification or add additional context in comments.

4 Comments

It is certain that your answer is very clever :) But I did not understand it. How can it for 1's?
@toygankılıç Sorry, what do you mean "How can it for 1's"?
My question is only for duplicate of 0. How it can be written for 1's.
Very clever use of conv2 here +1. I used your logic to generalize this for any consecutive numbers instead of 0 with im2col
3

Here is a generalized approach which removes any rows which has given number of consecutive duplicates (not just zero. could be any number).

t = 3;
row_mask = ~any(all(~diff(reshape(im2col(A,[1 t],'sliding'),t,size(A,1),[]))),3);
out = A(row_mask,:)

Sample Run:

>> A

A =

 0     1     0     1     0     1
 0     0     1     5     5     5    %// consecutive 3 5's
 0     0     1     0     0     1
 0     1     0     0     1     0
 1     1     1     0     0     1    %// consecutive 3 1's

>> out

out =

 0     1     0     1     0     1
 0     0     1     0     0     1
 0     1     0     0     1     0

2 Comments

Nice use of im2col!
@LuisMendo Thanks I wouldn't have thought of it if you haven't used conv2 :P They kind of look similar to me with im2col giving intermediate results :D
3

How about an approach using strings? This is certainly not as fast as Luis Mendo's method where you work directly with the numerical array, but it's thinking a bit outside of the box. The basis of this approach is that I consider each row of A to be a unique string, and I can search each string for occurrences of a string of 0s by regular expressions.

A=[0 1 0 1 0 1;
   0 0 0 1 1 1;
   0 0 1 0 0 1;
   0 1 0 0 1 0;
   1 0 0 0 1 0];
t = 3;

B = sprintfc('%s', char('0' + A));   
ind = cellfun('isempty', regexp(B, repmat('0', [1 t])));
B(~ind) = [];
B = double(char(B) - '0');

We get:

B =
     0     1     0     1     0     1
     0     0     1     0     0     1
     0     1     0     0     1     0

Explanation

  • Line 1: Convert each line of the matrix A into a string consisting of 0s and 1s. Each line becomes a cell in a cell array. This uses the undocumented function sprintfc to facilitate this cell array conversion.

  • Line 2: I use regular expressions to find any occurrences of a string of 0s that is t long. I first use repmat to create a search string that is full of 0s and is t long. After, I determine if each line in this cell array contains this sequence of characters (i.e. 000....). The function regexp helps us perform regular expressions and returns the locations of any matches for each cell in the cell array. Alternatively, you can use the function strfind for more recent versions of MATLAB to speed up the computation, but I chose regexp so that the solution is compatible with most MATLAB distributions out there.

    Continuing on, the output of regexp/strfind is a cell array of elements where each cell reports the locations of where we found the particular string. If we have a match, there should be at least one location that is reported at the output, so I check to see if any matches are empty, meaning that these are the rows we don't want to remove. I want to turn this into a logical array for the purposes of removing rows from A, and so this is wrapped with a cellfun call to determine the cells that are empty. Therefore, this line returns a logical array where a 0 means that remove this row and a 1 means that we don't.

  • Line 3: I take the logical array from Line 2 and invert it because that's what we really want. We use this inverted array to index into the cell array and remove those strings.

  • Line 4: The output is still a cell array, so I convert it back into a character array, and finally back into a numerical array.

6 Comments

Nice approach! You could also use ind = cellfun('isempty', strfind(B, repmat('0', [1 t])))
@LuisMendo - Cool! I always forget about that. Thanks!
Hehehe. My point was, regular expressions have a reputation for being slow (I haven't tested that). Maybe strfind could speed things up
@LuisMendo some versions of MATLAB don't have strfind which is why I opted for regular expressions. I actually used strfind at first!
You deserved the vote, it's a great answer! Except for going undocumented, that is :-P
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.