MATLAB find cell array substrings in a cell array of strings

Question

Let's say we have a cell array of substrings arrayOfSubstrings = {substr1;substr2} and a cell array of strings arrayOfStrings = {string1;string2;string3;stirng4}. How can I get a logical map into the cell array of strings where at least one of the substrings is found? I have tried

cellfun('isempty',regexp(arrayOfSubstrings ,arrayOfStrings ))

and

cellfun('isempty', strfind(arrayOfSubstrings , arrayOfStrings ))

and some other permutations of functions, but am not getting anywhere.

Suever · Accepted Answer · 2017-03-29 15:11:44Z

3

The issue is that with both strfind and regexp is that you can't provide two cell arrays and have them automatically apply all patterns to all strings. You will need to loop through one or the other to make it work.

You can do this with an explicit loop

strings = {'ab', 'bc', 'de', 'fa'};
substrs = {'a', 'b', 'c'};

% First you'll want to escape the regular expressions
substrs = regexptranslate('escape', substrs);

matches = false(size(strings));

for k = 1:numel(strings)
    matches(k) = any(~cellfun('isempty', regexp(strings{k}, substrs)));
end

% 1  1  0  1

Or if you are for loop-averse you can use cellfun

cellfun(@(s)any(~cellfun('isempty', regexp(s, substrs))), strings)
% 1  1  0  1

A Different Approach

Alternately, you could combine your sub-strings into a single regular expression

pattern = ['(', strjoin(regexptranslate('escape', substrs), '|'), ')'];
%   (a|b|c)

output = ~cellfun('isempty', regexp(strings, pattern));
%   1  1  0  1

edited Mar 29, 2017 at 15:11

answered Mar 29, 2017 at 15:00

Suever

65.6k14 gold badges91 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Confounded Over a year ago

Thank you. I prefer the loop as it makes the code clearer, IMHO.

Suever Over a year ago

@Confounded It's going to likely be significant slower than the custom regex

Confounded Over a year ago

"Custom regex"?

Suever Over a year ago

@Confounded The last part of my answer using the alternative approach

Confounded Over a year ago

The last approach gives me an output array of different size compared to the loop.

|

matlabbit · Accepted Answer · 2017-03-29 22:35:55Z

If you are using R2016b or R2017a you can just use contains:

>> strings = {'ab', 'bc', 'de', 'fa'};
>> substrs = {'a', 'b', 'c'};
>> contains(strings, substrs)

ans =

  1×4 logical array

   1   1   0   1

Contains is also the fastest, especially if you use the new string datatype.

function profFunc()

    strings = {'ab', 'bc', 'de', 'fa'};
    substrs = {'a', 'b', 'c'};

    n = 10000;

    tic;
    for i = 1:n
        substrs_translated = regexptranslate('escape', substrs);

        matches = false(size(strings));

        for k = 1:numel(strings)
            matches(k) = any(~cellfun('isempty', regexp(strings{k}, substrs_translated)));
        end
    end
    toc

    tic;
    for i = 1:n
        cellfun(@(s)any(~cellfun('isempty', regexp(s, substrs))), strings);
    end
    toc

    tic;
    for i = 1:n
        pattern = ['(', strjoin(regexptranslate('escape', substrs), '|'), ')'];
        output = ~cellfun('isempty', regexp(strings, pattern)); %#ok<NASGU>
    end
    toc

    tic;
    for i = 1:n
        contains(strings,substrs);
    end
    toc

    %Imagine you were using strings for all your text!
    strings = string(strings);

    tic;
    for i = 1:n
        contains(strings,substrs);
    end
    toc
end

Timing results:

>> profFunc
Elapsed time is 0.643176 seconds.
Elapsed time is 1.007309 seconds.
Elapsed time is 0.683643 seconds.
Elapsed time is 0.050663 seconds.
Elapsed time is 0.008177 seconds.

Collectives™ on Stack Overflow

MATLAB find cell array substrings in a cell array of strings

2 Answers 2

A Different Approach

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

A Different Approach

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related