0

I am currently new to Regular Expressions and would appreciate if someone can guide me through this.

import re
some = "I cannot take this B01234-56-K-9870 to the house of cards"

I have the above string and trying to extract the string with dashes (B01234-56-K-9870) using python regular expression. I have following code so far:

regex = r'\w+-\w+-\w+-\w+'
match = re.search(regex, some)
print(match.group()) #returns B01234-56-K-9870

Is there any simpler way to extract the dash pattern using regular expression? For now, I do not care about the order or anything. I just wanted it to extract string with dashes.

3
  • 1
    \w+(?:-\w+)+ would do it. If you expect exactly 3 dashes then \w+(?:-\w+){3} Commented Dec 9, 2020 at 14:51
  • 2
    It looks like your data is in a specific format. It is okay to express that properly in the regex. Commented Dec 9, 2020 at 14:53
  • There is nothing particularly wrong with your regex. Just know that \w matches the _ as well. Personally, I would find out if there is a more identifiable pattern and use that, such as (?:[A-Z\d]+-){3}\d+ if the final group is always digits and the first three groups are all caps and digits. Commented Dec 9, 2020 at 15:03

1 Answer 1

1

Try the following regex (as shortened by The fourth bird),

\w+-\S+

Original regex: (?=\w+-)\S+


Explanation:

  • \w+- matches 1 or more words followed by a -
  • \S+ matches non-space characters

Regex demo!

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for explanation. If I wanted to to capture only 3 dashes, how would I achieve that?
Then, use \w+(?:-\w+){3} as said by MonkeyZeus
Thanks @Tomerikoo for pointing that out ;-)
You don't need the lookahead, you can just match it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.