1

Consider the following cell array of strings containing filenames:

A = { 'abcout.txt';
      'outabcd.txt';
      'outabcef.png';
      'outout.txt' }

I'd like to find all .txt-files starting with "out".

I could do it as following:

filenames = regexp( A ,'out\w*.txt');
filenames = A( cellfun(@(x) ~isempty(x) && x == 1,filenames) )

returning the desired output:

filenames = 

    'outabcd.txt'
    'outout.txt'

But I wonder how I could use regexp to skip the cellfun step?

The following almost works:

filenames = regexp( A ,'out\w*.txt','match');
filenames = [filenames{:}]'

but it returns also the first string, which is invalid (and not even correctly displayed):

filenames = 

    'out.txt'
    'outabcd.txt'
    'outout.txt'

How do I need to modify: 'out\w*.txt' ?

3 Answers 3

2

Personally, I would modify that to

'^out.*\.txt$'

because \w* excludes things like

'out file.txt'

which I think should be included...Moreover, the original string is incorrect, in that it also matches

'outFileWtxt'

because you've not escaped the . metacharacter :)

Anyway, from a performance standpoint, getting rid of cellfun is actually not what you want; you should just use it properly:

%// dummy data
A = { 'abcout.txt';
      'outabcd.txt';
      'outabcef.png';
      'outout.txt' };

%// Make sure we have something substantial to do
A = repmat(A, 1e5,1);

%// New way   
tic
    F = regexp(A, '^out.*\.txt$', 'match');
    F = [F{:}];
toc

%// Old way with optimized cellfun() call
tic
    F = regexp(A, '^out.*\.txt$');
    F = A(~cellfun('isempty', F));
toc

Results:

Elapsed time is 0.928403 seconds. %// without cellfun
Elapsed time is 0.471774 seconds. %// with optimized cellfun

This call to cellfun is faster because the string options refer to specific, hard-coded functions in the cellfun binary. This is a lot faster than any anonymous function will be, because that has to be evaluated back in the MATLAB environment.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the additional advices, I was not considering so much exceptions before.
2

Use ^ to anchor at the beginning of the string and $ at the end of the string.

filenames = regexp( A ,'^out\w*.txt$');

Right now the out.txt you got from the text abcout.txt as you didn't use anchor.

Comments

1

try this regex: ^ - states for the beginning of the line

^out\w*.txt

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.