1

Some string:

s = 'some text some text date may 04 at 05 AM some text some text'

I've written the regex to extract date from the above like below:

m = re.search(r'date ([a-z]{3} [0-9]{2}) at ([0-9]{2}) ([P][M])|date ([a-z]{3} [0-9]{2}) at ([0-9]{2}) ([A][M])', s)

Is it possible to write this regex in some shorter way or can '|' character be used in a better way than this? Because the above regexps are only different at 'AM' and 'PM' part. I just don't feel right using this regex.

3
  • 1st thaught... `re.search(r'date ([a-z]{3} [0-9]{2}) at ([0-9]{2}) ([A|P][M]))', s) Commented Jul 22, 2015 at 9:46
  • @BarunSharma: It is such a common mistake to use | inside a character class. [A|P] matches A, | or P. Commented Jul 22, 2015 at 9:48
  • Yes. You are right. Thanks. Please ignore my comment :) Commented Jul 22, 2015 at 9:50

1 Answer 1

4

You can use

date ([a-z]{3} \d{2}) at (\d{2}) ([PA]M)

See demo

Compare your 2 alternatives:

date ([a-z]{3} [0-9]{2}) at ([0-9]{2}) ([P][M])
date ([a-z]{3} [0-9]{2}) at ([0-9]{2}) ([A][M])

Note how similar they are. We need to only add 1 alternative for PM or AM. It can be done by using a character class [PA] that will match either P or A.

Instead of [0-9], you can use shorthand class \d (it is a bit shorter :), and do not forget to declare the regex as a raw string with r'...').

Note I would use a case-insensitive flag re.I with this pattern (the pattern will match both pm and PM then).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.