1
(1[0-9]{2})\s+(\w+(?:-\w+)+)\s+(\w+)\s+(\w+(?:-\w+)+)\s+(\w+)

used to match string

123    FEX-1-80  Online  N2K-C2248TP-1GE    SSDFDFWFw23r23

How come this works in regexr.com but Python 3.5.1 can't find a match

r'(1[0-9]{2})\s+(\w+(?:-\w+)+)\s+(\w+)\s+(\w+(?:-\w+))'

can match up to

123    FEX-1-80  Online  N2K-C2248TP

but the second hyphen - in group(4) is not matched

From what I understand, non-capture group character can appear more than once in the group, what went wrong here?

0

2 Answers 2

1

Just a comment, not really an answer but for the sake of clarity I have put it as an answer.
Being relatively new to regular expressions, one should use the verbose mode. With this, your expression becomes much much more readable:

(1[0-9]{2})\s+     # three digits, the first one needs to be 1
(\w+(?:-\w+)+)\s+  # a word character (wc), followed by - and wcs
(\w+)\s+           # another word
(\w+(?:-\w+)+)\s+  # same expression as above
(\w+)              # another word

Also, check if your (second and fourth) expression could be rewritten as [\w-]+ - it is not the same as yours and will match other substrings but try to avoid nested parenthesis in general.

Concerning your question, the second string cannot be matched as you made all of your expressions mandatory (and group 5 is missing in the second example, so it will fail).

See a demo on regex101.com.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the link, even more powerful than regexr.com. [\w-]+ can probably satisfy my need, how is it different from \w+(?:-\w+)+
It is pretty different: first and second
0

This regular expression matches the full input string:

(1[0-9]{2})\s+(\w+(?:-\w+)+)\s+(\w+)\s+(\w+(?:-\w+)+)\s+(\w+)

This one doesn't:

(1[0-9]{2})\s+(\w+(?:-\w+)+)\s+(\w+)\s+(\w+(?:-\w+))

The latter is missing a + after the last non-capturing group, and it's missing the \s+(\w+) at the end that matches the SSDFDFWFw23r23 at the end of the input string.

From what I understand, non-capture group character can appear more than once in the group, what went wrong here?

I'm not sure I follow. A non-capturing group is really just there to group a part of a regular expression.

(?:-\w+) or just -\w+ will both match a hyphen (-) followed by one or more "word" characters (\w+). It doesn't matter whether that regular expression is in a non-capturing group or not. If you want to match repetitions of that pattern, you can use the + modifier after the non-capturing group, e.g. (?:-\w+)+. That pattern will match a string like -foo-bar-baz.

So the reason your second regular expression doesn't match the repeated pattern is because it's lacking the + modifier.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.