Find substring in cell array of numbers and strings

Question

I have a cell array consisting of numbers, strings, and empty arrays. I want to find the position (linear or indexed) of all cells containing a string in which a certain substring of interest appears.

mixedCellArray = {
   'adpo' 2134  []
   0 [] 'daesad'
   'xxxxx' 'dp' 'dpdpd'
}

If the substring of interest is 'dp', then I should get the indices for three cells.

The only solutions I can find work when the cell array contains only strings:

One work-around is to find all cells not containing strings, and fill them with '', as hinted by this posting. Unfortunately, my approach requires a variation of that solution, probably something like cellfun('ischar',mixedCellArray). This causes the error:

Error using cellfun
Unknown option.

Thanks for any suggestions on how to figure out the error.

I've posted this to usenet

EDUCATIONAL AFTERNOTE: For those who don't have Matlab at home, and end up bouncing back and forth between Matlab and Octave. I asked above why cellfun doesn't accept 'ischar' as its first argument. The answer turns out to be that the argument must be a function handle in Matlab, so you really need to pass @ischar. There are some functions whose names can be passed as strings, for backward compatibility, but ischar is not one of them.

See: timing results. Note the performance disadvantage of tweet-style one line coding. — sco1
– sco1, Commented Jan 25, 2017 at 17:59

gnovice · Accepted Answer · 2017-01-25 18:04:49Z

4

How about this one-liner:

>> mixedCellArray = {'adpo' 2134  []; 0 [] 'daesad'; 'xxxxx' 'dp' 'dpdpd'};
>> index = cellfun(@(c) ischar(c) && ~isempty(strfind(c, 'dp')), mixedCellArray)

index =

  3×3 logical array

   1   0   0
   0   0   0
   0   1   1

You could get by without the ischar(c) && ..., but you will likely want to keep it there since strfind will implicitly convert any numeric values/arrays into their equivalent ASCII characters to do the comparison. That means you could get false positives, as in this example:

>> C = {65, 'A'; 'BAD' [66 65 68]}  % Note there's a vector in there

C =

  2×2 cell array

    [ 65]    'A'         
    'BAD'    [1×3 double]

>> index = cellfun(@(c) ~isempty(strfind(c, 'A')), C)  % Removed ischar(c) &&

index =

  2×2 logical array

   1   1                % They all match!
   1   1

edited Jan 25, 2017 at 18:04

answered Jan 25, 2017 at 17:55

gnovice

126k16 gold badges259 silver badges364 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user36800 Over a year ago

Wow. That's educational. I need to go back into my code and build in that error trapping. Thanks.

sco1 · Accepted Answer · 2017-01-25 01:09:36Z

4

Just use a loop, testing with ischar and contains (added in R2016b). The various *funs are basically loops and, in general, do not offer any performance advantage over the explicit loop.

mixedCellArray = {'adpo' 2134  []; 0 [] 'daesad'; 'xxxxx' 'dp' 'dpdpd'};
querystr = 'dp';

test = false(size(mixedCellArray));
for ii = 1:numel(mixedCellArray)
    if ischar(mixedCellArray{ii})
        test(ii) = contains(mixedCellArray{ii}, querystr);
    end
end

Which returns:

test =

  3×3 logical array

   1   0   0
   0   0   0
   0   1   1

Edit:

If you don't have a MATLAB version with contains you can substitute a regex:

test(ii) = ~isempty(regexp(mixedCellArray{ii}, querystr, 'once'));

edited Jan 25, 2017 at 1:09

answered Jan 25, 2017 at 1:01

sco1

12.2k5 gold badges30 silver badges49 bronze badges

8 Comments

buzjwa Over a year ago

strfind instead of regexp is simpler. Don't encourage using regexp where simpler specializations exist!

sco1 Over a year ago

@Naveh Do you have a real reason to recommend strfind over regexp? "is simpler" is a really dumb reason not to recommend a more powerful function; certainly not a profound multiple upvote-worthy comment one. The syntax isn't even any different: regexp(mixedCellArray{ii}, querystr) vs. strfind(mixedCellArray{ii}, querystr). Yippee... What exactly is the gain?

buzjwa Over a year ago

Perhaps my comment was too simple :) The reasoning behind using regexp specializations (strfind, strtrim, strrep etc) instead of regexp itself is readability. My rule of thumb is to only use regexp where no fitting specialization exists. What we want to do is find a substring, so we use strfind. It is simpler to understand, which is why I advocate this.

user36800 Over a year ago

Naveh, excaza, I want to thank you both for sharing your knowledge of Matlab features for solving this problem. However, I exhort both of you to respect the different merits of your answers. I find them both very educational. Simplicity has a great deal of merit, and for that, I thank Naveh. However, I know that all too soon, I will need to move regular expressions, and I appreciate the gateway code for me to do that, excaza.

Suever Over a year ago

@user36800 You should really check out the timing results mentioned in this comment. One liners are typically less legible, more error prone, and slower than explicitly using a loop which MATLAB is able to accelerate naturally.

|

PURUSHOTTAMA · Accepted Answer · 2017-01-25 10:05:17Z

2

z=cellfun(@(x)strfind(x,'dp'),mixedCellArray,'un',0);
idx=cellfun(@(x)x>0,z,'un',0);
find(~cellfun(@isempty,idx))

edited Jan 25, 2017 at 10:05

answered Jan 25, 2017 at 9:50

PURUSHOTTAMA

214 bronze badges

1 Comment

user36800 Over a year ago

Thanks, Purushottama. That seems to be similar to the approach proposed on usenet.

user36800 · Accepted Answer · 2017-01-25 17:37:40Z

Here is a solution from the usenet link in my original post:

>> mixedCellArray = {
         'adpo' 2134  []
         0 [] 'daesad'
         'xxxxx' 'dp' 'dpdpd'
      }

mixedCellArray =
    'adpo'     [2134]          []
    [    0]        []    'daesad'
    'xxxxx'    'dp'      'dpdpd'

>> ~cellfun( @isempty , ...
             cellfun( @(x)strfind(x,'dp') , ...
                      mixedCellArray , ...
                      'uniform',0) ...
           )

ans =
     1     0     0
     0     0     0
     0     1     1

The inner cellfun is able to apply strfind to even numerical cells because, I presume, Matlab treats numerical arrays and strings the same way. A string is just an array of numbers representing the character codes. The outer cellfun identifies all cells for which the inner cellfun found a match, and the prefix tilde turns that into all cells for which there was NO match.

Thanks to dpb.

Collectives™ on Stack Overflow

Find substring in cell array of numbers and strings

4 Answers 4

1 Comment

8 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

8 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related