1

Let's say we have a cell array of substrings arrayOfSubstrings = {substr1;substr2} and a cell array of strings arrayOfStrings = {string1;string2;string3;stirng4}. How can I get a logical map into the cell array of strings where at least one of the substrings is found? I have tried

cellfun('isempty',regexp(arrayOfSubstrings ,arrayOfStrings ))

and

cellfun('isempty', strfind(arrayOfSubstrings , arrayOfStrings ))

and some other permutations of functions, but am not getting anywhere.

0

2 Answers 2

3

The issue is that with both strfind and regexp is that you can't provide two cell arrays and have them automatically apply all patterns to all strings. You will need to loop through one or the other to make it work.

You can do this with an explicit loop

strings = {'ab', 'bc', 'de', 'fa'};
substrs = {'a', 'b', 'c'};

% First you'll want to escape the regular expressions
substrs = regexptranslate('escape', substrs);

matches = false(size(strings));

for k = 1:numel(strings)
    matches(k) = any(~cellfun('isempty', regexp(strings{k}, substrs)));
end

% 1  1  0  1

Or if you are for loop-averse you can use cellfun

cellfun(@(s)any(~cellfun('isempty', regexp(s, substrs))), strings)
% 1  1  0  1

A Different Approach

Alternately, you could combine your sub-strings into a single regular expression

pattern = ['(', strjoin(regexptranslate('escape', substrs), '|'), ')'];
%   (a|b|c)

output = ~cellfun('isempty', regexp(strings, pattern));
%   1  1  0  1
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you. I prefer the loop as it makes the code clearer, IMHO.
@Confounded It's going to likely be significant slower than the custom regex
"Custom regex"?
@Confounded The last part of my answer using the alternative approach
The last approach gives me an output array of different size compared to the loop.
|
1

If you are using R2016b or R2017a you can just use contains:

>> strings = {'ab', 'bc', 'de', 'fa'};
>> substrs = {'a', 'b', 'c'};
>> contains(strings, substrs)

ans =

  1×4 logical array

   1   1   0   1

Contains is also the fastest, especially if you use the new string datatype.

function profFunc()

    strings = {'ab', 'bc', 'de', 'fa'};
    substrs = {'a', 'b', 'c'};

    n = 10000;

    tic;
    for i = 1:n
        substrs_translated = regexptranslate('escape', substrs);

        matches = false(size(strings));

        for k = 1:numel(strings)
            matches(k) = any(~cellfun('isempty', regexp(strings{k}, substrs_translated)));
        end
    end
    toc

    tic;
    for i = 1:n
        cellfun(@(s)any(~cellfun('isempty', regexp(s, substrs))), strings);
    end
    toc

    tic;
    for i = 1:n
        pattern = ['(', strjoin(regexptranslate('escape', substrs), '|'), ')'];
        output = ~cellfun('isempty', regexp(strings, pattern)); %#ok<NASGU>
    end
    toc

    tic;
    for i = 1:n
        contains(strings,substrs);
    end
    toc

    %Imagine you were using strings for all your text!
    strings = string(strings);

    tic;
    for i = 1:n
        contains(strings,substrs);
    end
    toc
end

Timing results:

>> profFunc
Elapsed time is 0.643176 seconds.
Elapsed time is 1.007309 seconds.
Elapsed time is 0.683643 seconds.
Elapsed time is 0.050663 seconds.
Elapsed time is 0.008177 seconds.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.