Parsing Data Using REGEX

Question

I'm having problem with extracting data using regex, usually i can do it easily but i find myself stuck here. I'm trying to extract the part that comes after "n" and before "end"

the data I can have is

 jack.   n n klln kjj kll end
 jane.      n    n kien wsdn end
 jone.      n losn djs end
 jord.   n      sdjn sdkln end

Now "n" can occur one or two times only.

I've used this to extract $3

\(.+?\.) .*n.* (n|\s) (.*) end\

It works for every instance, but it doesn't work for line 3. What it does is it also includes "losn". In all cases, two "n" or one "n" can occur. if one "n" is present, "n" can either be a space away from the data that I want. Or many spaces away.

tenub · Accepted Answer · 2014-02-11 16:11:14Z

1

Ok nevermind, I think I did it.

I changed:

  \(.+?\.) .*n.* (n|\s) (.*) end\

to (added a "?" to make a secondary "n" optional):

  \(.+?\.) .*n?.* (n|\s) (.*) end\

edited Feb 11, 2014 at 16:11

tenub

3,4561 gold badge18 silver badges26 bronze badges

answered Feb 11, 2014 at 15:55

Le Ray

3872 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jerry · Accepted Answer · 2014-02-11 16:12:37Z

1

I think it'll be safer if you use something like this instead:

^[^.]+\.\s*n(?:\s*n)?\s* (.*) end

Using . to match 'any character' can lead to efficiency issues. As such, I recommend using [^.]+ (or in case the first part can contain periods as well, .+?) for the first part.

Then using \s* instead of the \s* and using the optional group (?:\s*n)? for the possible second n.

regex101 demo

answered Feb 11, 2014 at 16:12

Jerry

71.8k14 gold badges106 silver badges148 bronze badges

Collectives™ on Stack Overflow

Parsing Data Using REGEX

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related