1

I'm having trouble implementing a regex pattern in python. My expression works in on https://regexr.com/ but I can't get it to work in python.

Here is my expression: [abcd][(]\d+[)]|(ab)[(]\d+[)]|(abcd)[(]\d+[)] I want to find and return instances a(\d+), b(\d+), c(\d+), d(\d+), ab(\d+), or abcd(\d+)

expressions = re.findall(r"[abcd][(]\d+[)]|(ab)[(]\d+[)]|(abcd)[(]\d+[)]",line)
print(expressions)

I think it might be working because when I have something in the string that should match the pattern I get [('', '')] as my output instead of []

Any thoughts?

1
  • 1
    Edit the question to show one or more examples with assignment to "line", output and expected output. Commented Apr 29, 2020 at 17:10

2 Answers 2

3

I think you misused [(]or [)]. And \(\d+\) part is redundant. So you can optimize it:

import re

line = 'a(123) b(11) ab(35) bc(45) abcd(1234)'

expressions = re.findall(
    r'(?:[abcd]|ab|abcd)\(\d+\)',
    line)
print(expressions)

output:

['a(123)', 'b(11)', 'ab(35)', 'c(45)', 'abcd(1234)']

explanation:

  • (?:...) is non-capturing group. It is only for grouping not capturing.
  • \( and \): \ is escape character for special characters like ( or ). \( matches literal (.
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks! What does the ?: do?
@usr2564301 I revised my answer.
@BoseongChoi That's a great simplification!
@JadonErwin I appended short explanation :)
@BoseongChoi Thanks, one always learns sth new - forgot the non-capturing group (?:...). Will be useful in future!
|
2
expressions = re.findall(r"[abcd](\d+)|(ab)(\d+)|(abcd)(\d+)",line)
print(expressions)

This should work. problem was: In Python, you don't put ( in brackets. If you mean the parentheses in (d+) literally, you have to use escaped paranthese \( and \).

Be aware, if you put paranthese around ab or abcd they will be listed when referencing to groupings. I would not put parantheses as long as it is not necessary.

expressions = re.findall(r"[abcd]\(\d+\)|ab\(\d+\)|abcd\(\d+\)",line)
print(expressions)

If you want to just match a1236, ab12, abcd12342, then use

expressions = re.findall(r"[abcd]\d+|ab\d+|abcd\d+",line)
print(expressions)

However, if you want to capture certain parts with their repetitions, put parantheses around them.

2 Comments

I meant any integer by d+
yeah I realized. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.