1

I'm trying to write a python regex matching two patterns: the first one is scratch_alpha and the second one is scratch_alpha1*12 (where 12 can be any decimal number) and I'd like to put the value after * inside a variable and if scratch_alpha is detected with * after, just write 1 in a variable

I wrote this regex: ([a-zA-Z0-9\_*]+)(\*\d+)?

I expected to get two groups after, the first one which would be the name "scratch_alpha" and the second which would be the number after * or None (and if None, I initialize the variable to 1).

But with my regex, it seems that the first group contains everything (scratch_alpha*12) and not scratch_alpha in first group and value in second group.

2
  • You should remove * from [a-zA-Z...] Commented May 30, 2016 at 16:09
  • There's an asterisk in your first group Commented May 30, 2016 at 16:09

3 Answers 3

4

Try this regex: ([^*]+)\*(\d+)

  • Group one: all characters until *
  • Group two: all numbers after *

Regex demo 1


UPDATE

To meet your requirements for patterns

  • scratch_alpha
  • scratch_alpha1*12
    • capture number after *
    • Number after * is optional

You can try the regex below:

scratch_alpha(?:(?:\d+)?\*(\d+)?)?

If the capture group is empty, then there is no number after * and you can initialize you variable with 1.

Regex demo 2

Sign up to request clarification or add additional context in comments.

Comments

2

No need of * in first group,

([a-zA-Z0-9\_]+)(\*\d+)?

Also you may change (\*\d+)? to (\*(\d+))? if you want characters before & after * separately.

Comments

1

This happens because inside your first parentheses you have put a +, which means minimum of one or more occurrences. And since your second parentheses has a ?, the second group is optional. Hence, it is omitted as your first parentheses can match your whole string and the second one doesn't need to match anything.

You can overcome this by removing the * from within the [] so it isn't matched and the * can't be matched in your first parentheses. So now your regex will be ([a-zA-Z0-9\_]+)(\*\d+)?.

Hope this helps.

3 Comments

This is a repetition of the answer by Noobscripter.
@AniMenon It's an explanation of why his original regex didn't match and why the corrected one does. Please explain how is it a repetition when the easiest solution for him to understand would be the one with least modification to his regex and explains the difference?
Explanation is good, you may just add the same to the other answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.