1

I have a '3 x 1' cell array the contents of which appear like the following:

'ASDF_LE_NEWYORK Fixedafdfgd_ML'
'Majo_LE_WASHINGTON FixedMonuts_ML'
'Array_LE_dfgrt_fdhyuj_BERLIN Potato Price'

I want to be able to elegantly extract and create another '3x1' cell array with contents as:

'NEWYORK'
'WASHINGTON'
'BERLIN'

If you notice in above the NAME's are after the last underscore and before the first SPACE or '_ML'. How do I write such code in a concise manner.

Thanks

Edit:

Sorry guys I should have used a better example. I have it corrected now.

2
  • The names aren't after the last underscore, at least not in the first two entries. Commented Sep 23, 2013 at 23:09
  • I updated my answer to get the output in the format you requested. Commented Sep 23, 2013 at 23:43

2 Answers 2

2

You can use lookbehind for _ and lookahead for space:

names = regexp(A, '(?<=_)[^\s_]*(?=\s)', 'match', 'once');

Where A is the cell array containing the strings:

A = {...
'ASDF_LE_NEWYORK Fixedafdfgd_ML'
'Majo_LE_WASHINGTON FixedMonuts_ML'
'Array_LE_dfgrt_fdhyuj_BERLIN Potato Price'};

>> names = regexp(A, '(?<=_)[^\s_]*(?=\s)', 'match', 'once')
names = 
    'NEWYORK'
    'WASHINGTON'
    'BERLIN'
Sign up to request clarification or add additional context in comments.

2 Comments

It works. But can you please explain how to read this: '(?<=_)[^\s_]*(?=\s)' ?
(?<=_) looks for a _ before the matching string but doesn't include it in the match and (?=\s) looks for a space after matching string and doesn't include it in the match and the matching string is [^\s_]* meaning a sequence of non-space, non-underscore characters. See regular-expressions.info/lookaround.html for more info
1

NOTE: The question was changed, so the answer is no longer complete, but hopefully the regexp example is still useful.

Try regexp like this:

names = regexp(fullNamesCell,'_(NAME\d?)\s','tokens');
names = cellfun(@(x)(x{1}),names)

In the pattern _(NAME\d?)\s, the parenthesis define a subexpression, which will be returned as a token (a portion of matched text). The \d? specifies zero or one digits, but you could use \d{1} for exactly one digit or \d{1,3} if you expect between 1 and 3 digits. The \s specified whitespace.

The reorganization of names is a little convoluted, but when you use regexp with a cell input and tokens you get a cell of cells that needs some reformatting for your purposes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.