0

Imagine I want to find all time expressions referring to 'AM' and 'PM' in a string. Let's ignore for the moment that I could use '[AP]M' to do this (because I'm actually pulling the list of valid strings ['AM','PM'] from a dictionary whose keys are language codes). I'd like to look for both at once, like this:

foo = ['am','pm']
separator = ':'
timex = re.compile('(1[012]|[1-9])%s([0-5][0-9])( %s)?' % (separator, foo), re.I)

bar = "It's 6:00 pm, do you know where your brain is?"

timex as written above doesn't get me what I'm after: it only matches to the 'p' in 'pm'. (It seems to be treating all the chars of the list elements as though they were [ampm].)

What I don't want is to do two passes over the string (one each for 'am' and 'pm').

Is there a nice Pythonic way to do a single pass for every item in foo?

6
  • Is there a reason why you can't do a check for (am|pm) in regex? Commented Mar 10, 2014 at 23:10
  • I'd like to avoid locale-specific expressions. English uses am/pm, but Korean uses 오전/오후, Greek uses π.μ./μ.μ., etc. I tried to simplify the code in my question for readability. In actuality, foo would be something like ampms[locale], where ampms is a dictionary of locales to 2-element lists. Commented Mar 10, 2014 at 23:17
  • What does your result look like? Commented Mar 10, 2014 at 23:18
  • Did you try joining foo with | and then sticking it in the regex group then? Commented Mar 10, 2014 at 23:20
  • @AaronHall: timex.search(bar).group() yields '6:00 p' Commented Mar 10, 2014 at 23:21

1 Answer 1

1

Here's the way I've inserted a list of arbitrary regex terms to be searched for:

import re

foo = ['am','pm']
timex = re.compile('({foo})'.format(foo='|'.join(foo)))

bar = "It's 6:00 pm, do you know where your brain is?"

timex.findall(bar)

returns

['pm']

You can capture more:

>>> timex = re.compile(r'(\d{{1,2}}:\d{{2}})\s*({foo})'.format(foo='|'.join(foo)))
>>> timex.findall(bar)
[('6:00', 'pm')]
Sign up to request clarification or add additional context in comments.

1 Comment

Here it is reworked to be closer to the original: timex = re.compile(r'(1[012]|[1-9]){}([0-5][0-9])\s*({})'.format(sep, '|'.join(foo)))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.