1

I'm having problem with extracting data using regex, usually i can do it easily but i find myself stuck here. I'm trying to extract the part that comes after "n" and before "end"

the data I can have is

 jack.   n n klln kjj kll end
 jane.      n    n kien wsdn end
 jone.      n losn djs end
 jord.   n      sdjn sdkln end

Now "n" can occur one or two times only.

I've used this to extract $3

\(.+?\.) .*n.* (n|\s) (.*) end\

It works for every instance, but it doesn't work for line 3. What it does is it also includes "losn". In all cases, two "n" or one "n" can occur. if one "n" is present, "n" can either be a space away from the data that I want. Or many spaces away.

2 Answers 2

1

Ok nevermind, I think I did it.

I changed:

  \(.+?\.) .*n.* (n|\s) (.*) end\

to (added a "?" to make a secondary "n" optional):

  \(.+?\.) .*n?.* (n|\s) (.*) end\
Sign up to request clarification or add additional context in comments.

Comments

1

I think it'll be safer if you use something like this instead:

^[^.]+\.\s*n(?:\s*n)?\s* (.*) end

Using . to match 'any character' can lead to efficiency issues. As such, I recommend using [^.]+ (or in case the first part can contain periods as well, .+?) for the first part.

Then using \s* instead of the \s* and using the optional group (?:\s*n)? for the possible second n.

regex101 demo

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.