0

I'm trying to write a regex pattern which will either match a number or a number and a trailing string. So match should ouput:

Matching "string100":      [('100', '')]
Matching "string900_TYPE": [('900', 'TYPE')]

But instead, I get:

Matching "string100":      [('100', '')]
Matching "string900_TYPE": [('900', ''), ('', 'TYPE')]

The idea is to have the number as the first item in the tuple and the "TYPE" as the second, so I can easily determine whether "TYPE" exists in a tuple or not (second tuple item is empty --> '').

Code:

import re


stringList = ["string100", "string900_TYPE"]
pattern = r"(\d{3})|\w(TYPE)"

for string in stringList:
    match = re.findall(pattern, string)
    print('Matching "' + string + '":\t', match)

Thanks in advance.

6
  • Like that: (\d{3})(?:_(\w+))?. Don't use an alternation, describe the full string. Commented Oct 9, 2017 at 21:05
  • 1
    @CasimiretHippolyte: That did it. Thank you! Commented Oct 9, 2017 at 21:10
  • The pattern can actually be simplified to +(\d{3})_?(\w+)? which requires less steps for the regex engine. Is there any reason for using a non-capturing group around the subgroup (\w+) ? Commented Oct 9, 2017 at 22:45
  • If you use (\d{3})_?(\w+)? (note there must be no + at the start) then you may also match 123_. (\d{3})(?:_(\w+))? is best here since (?:_(\w+))? makes the whole sequence of patterns optional. Commented Oct 9, 2017 at 22:46
  • The + must have slipped in somehow... 123_ does not seem to be captured, though. Commented Oct 9, 2017 at 23:02

1 Answer 1

1

(\d{3})(?:_(\w+))?will do the trick. (Thanks to @Casimir et Hippolyte)

It is also more robust than using a simpler pattern like (\d{3})_?(\w+)? (thanks to @Wiktor Stribiżew)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.