-1

Initially I had my date regex working as follows, to capture "February 12, 2018" for example

match = re.search(r'(January|February|March|April|May|June|July|August|September?|October?|November|December)\s+\d{1,2},\s+\d{4}', date).group()

But I want it to become more flexible, and input my variable string into my regex but I can't seem to get it to work after looking through many of the stackoverflow threads about similar issues. I'm quite a novice so I'm not sure what's going wrong. I'm aware that simply MONTHS won't work. Thank you

MONTHS = "January|February|March|April|May|June|July|August|September|October|November|December"

match = re.search(r'(MONTHS)\s+\d{1,2},\s+\d{4}', date).group()

print(match)
'NoneType' object has no attribute 'group'
2
  • 3
    On a side note why '?' for September and October ? Commented Aug 13, 2018 at 18:59
  • Ah whoops that was from my old string when I had Sep(tember)?, Oct(ober)?, etc. Commented Aug 13, 2018 at 19:01

2 Answers 2

1

You've got MONTHS as just a part of the match string, python doesn't know that it's supposed to be referencing a variable that's storing another string.

So instead, try:

match = re.search(r'(' + MONTHS + ')\s+\d{1,2},\s+\d{4}', date).group()

That will concatenate (stick together) three strings, the first bit, then the string stored in your MONTHS variable, and then the last bit.

Sign up to request clarification or add additional context in comments.

1 Comment

now it works, thank you very much!
0

If you want to substitute something into a string, you need to use either format strings (whether an f-string literal or the format or format_map methods on string objects) or printf-style formatting (or template strings, or a third-party library… but usually one of the first two).

Normally, format strings are the easiest solution, but they don't play nice with strings that need braces for other purposes. You don't want that {4} to be treated as "fill in the 4th argument", and escaping it as {{4}} makes things less readable (and when you're dealing with regular expressions, they're already unreadable enough…).

So, printf-style formatting is probably a better option here:

pattern = r'(%s)\s+\d{1,2},\s+\d{4}' % (MONTHS,)

… or:

pattern = r'(%(MONTHS)s)\s+\d{1,2},\s+\d{4}' % {'MONTHS': MONTHS}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.