1

I have a MATLAB cell array of strings and a second array with partial strings:

base = {'a','b','c','d'}
all2 = {'a1','b1','c1','d1','a2','b2','c2','d2','q8','r15'}

The output is:

base = 

    'a'    'b'    'c'    'd'


all2 = 

    'a1'    'b1'    'c1'    'd1'    'a2'    'b2'    'c2'    'd2'    'q8'    'r15'

Problem/Requirement

If any of 'a1','b1','c1','d1' AND any of 'a2','b2','c2','d2' are present in the all2 array, then return a variable numb=2.

If any of 'a1','b1','c1','d1' AND any of 'a2','b2','c2','d2' AND any of 'a3','b3','c3','d3' are present in the all2 array, then return a variable numb=3.

Attempts

1.

Based on strfind(this approach), I tried matches = strfind(all2,base); but I got this error:

`Error using strfind`

`Input strings must have one row.`
....

2.

This other approach using strfind seemed better but just gave me

fun = @(s)~cellfun('isempty',strfind(all2,s));
out = cellfun(fun,base,'UniformOutput',false)
idx = all(horzcat(out{:}));
idx(1,1) 

out = 

[1x10 logical]    [1x10 logical]    [1x10 logical]    [1x10 logical]


ans =

     0

Neither of these attempts have worked. I think my logic is incorrect.

3.

This answer allows to find all indices of an array of partial strings in an array of strings. It returns:

base = regexptranslate('escape', base);
matches = false(size(all2));
for k = 1:numel(all2)
    matches(k) = any(~cellfun('isempty', regexp(all2{k}, base)));
end
matches

Output:

matches =

     1     1     1     1     1     1     1     1     0     0

My problem with this approach: How do I use the output matches to calculate numb=2? I am not sure if this is the most relevant logic for my specific question since it only gives matching indices.

Question

Is there a way to do this in MATLAB?

EDIT

Additional Information:

The array all2 WILL always be contiguous. A scenario of all2 = {'a1','b1','c1','d1','a3','b3','c3','d3','q8','r15'} is not possible.

12
  • What should happen when numbers aren't contiguous? Like all2 = {'a1' 'a3' 'a4'}; Should that return numb = 3? Commented Apr 19, 2017 at 20:17
  • @gnovice numb should be 1 for that case Commented Apr 19, 2017 at 20:18
  • @gnovice I assume you meant all2 = {'a1', 'a3', 'a4'};. If so, then you are correct. If, all2 = {'a1', 'a3' ,'a4'} then the return should be numb=3. Using my example in the OP: If any of 'a1',... AND any of 'a2',... AND any of 'a3',... are present in the all2 array, then return a variable numb=3. Commented Apr 19, 2017 at 20:23
  • There's no 'a2, ...' in his example... Commented Apr 19, 2017 at 20:24
  • Yes, there is no a2. However, there are still a3 and a4. So both contiguous and non-contiguous are required. Commented Apr 19, 2017 at 20:25

1 Answer 1

2

Using a regex to find the unique suffixes to the base elements:

base = {'a','b','c','d'};
all2 = {'a1','b1','c1','d1','a2','b2','c2','d2', 'a4', 'q8','r15'};

% Use sprintf to build the expression so we can concatenate all the values
% of base into a single string; this is the [c1c2c3] metacharacter.
% Assumes the values of base are going to be one character
%
% This regex looks for one or more digits preceeded by a character from
% base and returns only the digits that match this criteria.
regexstr = sprintf('(?<=[%s])(\\d+)', [base{:}]);

% Use once to eliminate a cell array level
test = regexp(all2, regexstr, 'match', 'once');

% Convert the digits to a double array
digits = str2double(test);

% Return the number of unique digits. With isnan() we can use logical indexing
% to ignore the NaN values
num = numel(unique(digits(~isnan(digits))));

Which returns:

>> num

num =

     3

If you need continuous digits then something like this should be valid:

base = {'a','b','c','d'};
all2 = {'a1','b1','c1','d1','a2','b2','c2','d2', 'a4', 'q8','r15'};

regexstr = sprintf('(?<=[%s])(\\d+)', [base{:}]);
test = regexp(all2, regexstr, 'match', 'once');
digits = str2double(test);

% Find the unique digits, with isnan() we can use logical indexing to ignore the
% NaN values
unique_digits = unique(digits(~isnan(digits)));

% Because unique returns sorted values, we can use this to find where the
% first difference between digits is greater than 1. Append Inf at the end to
% handle the case where all values are continuous.
num = find(diff([unique_digits Inf]) > 1, 1);  % Thanks @gnovice :)

Which returns:

>> num

num =

     2

Breaking down the regexp and sprintf lines: Because we know that base only consists of single characters, we can use the [c1c2c3] metacharacter, which will match any character inside the brackets. So if we have '[rp]ain' we'll matche 'rain' or 'pain', but not 'gain'.

base{:} returns what MATLAB calls a comma-separated list. Adding the brackets concatenates the result into a single character array.

Without brackets:

>> base{:}

ans =

    'a'


ans =

    'b'


ans =

    'c'


ans =

    'd'

With brackets:

>> [base{:}]

ans =

    'abcd'

Which we can insert into our expression string with sprintf. This gives us (?<=[abcd])(\d+), which matches one or more digits preceeded by one of either a, b, c, d.

Sign up to request clarification or add additional context in comments.

6 Comments

I think this solution works. I am still thinking about the contiguous/non-contiguous part. I updated the OP but I may need to delete that. Still thinking....
I've added a solution with a continuous restriction
Yes, it needs to be contiguous. Ok,a thanks for separating these out. This is very specific and I had not initially thought of the 2 cases. This works for me.
Ok, except for the first 3 lines, I seem to understand the other lines....they seem clear. Just the first 3 are a little confusing. Mainly line 1...could you explain how you used sprintf to assemble the expression required by regexp?
I've added a breakdown of the regex, hopefully it is helpful.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.