1

I am trying to parse the numeral content embedded in a string. The string has three possible forms:

  1. 'avenue\d+', where \d+ is a number with one or more digits or
  2. 'road\d+' or
  3. 'lane\d+' I tried:
re.sub(r'(?:avenue(\d+)|road(\d+)|lane(\d*))',r'\1','road12')

This code works well for the first line below, but incorrectly for the second.

re.sub(r'(?:avenue(\d+)|road(\d+)|lane(\d*))',r'\1','avenue12')
Out[81]: '12'
re.sub(r'(?:avenue(\d+)|road(\d+)|lane(\d*))',r'\1','road12')
Out[82]: ''

what am I doing incorrectly? thanks i

2
  • So, if you have strings "avenue", "lane", or "road" followed by any number of digits, you want to extract the digits? Commented Oct 27, 2022 at 18:23
  • The capturing group that participated in the match was different, so use r'\1\2\3'. Also, the non-capturing group is superfluous, remove it. Commented Oct 27, 2022 at 18:24

2 Answers 2

2

The capturing group that participated in the match was different. In the first case, it was Group 1, in the second case, it was Group 2.

Also, note that the non-capturing group is superfluous, remove it.

To fix the immediate issue, you can use r'\1\2\3' as replacement:

re.sub(r'avenue(\d+)|road(\d+)|lane(\d+)',r'\1\2\3','road12')

However, it seems extracting is much simpler here:

m = re.search(r'(?:avenue|road|lane)(\d+)','road12')
if m:
    print(m.group(1))

See the regex demo.

Details:

  • (?:avenue|road|lane) - either avenue, road, or lane
  • (\d+) - Group 1: one or more digits.
Sign up to request clarification or add additional context in comments.

2 Comments

I see that he has \d* next to lane - probably he wants to match lane without number too. Now it must have a number.
@AndrejKesely There is no way OP needs an empty result when extracting data.
1

Would this work? The part that changes, avenue, road or lane can go in the non capturing group, then get the following number:

re.sub(r'(?:avenue|road|lane)(\d+)',r'\1','road12')

2 Comments

This answer by mjsqu works, and so does the answer by Wiktor. Thank you both. However, in my real code the numeral pattern is different for each "road/avenue/..) something like this: avenue is numbered 12-24 while road us numbered 23-44_34 etc. So I really would like a way to have the text and numeral decoded together. Wiktor's solution will do this. the only question in my mind is why areall the three groups captured even if they do not match?
All three groups are not captured. Refer to your original code in the Regex 101 demo here: regex101.com/r/jCeYxs/1 - For the first example with road only \2 is captured, for an example with avenue only \1 is captured.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.