0

I am trying to return 2 subgroups from my regex match:

email_add = "[email protected] <[email protected]>"
m = re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add)

But it doesn't seem to match:

>>> m.group()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

I suspect I probably did not group it correctly or I'm using incorrect word boundary. I tried \w instead of \b but the result is the same.

Could someone please point out my errors.

2
  • Your regex doesn't match the string. You need case-insensitive matching. Commented Mar 1, 2013 at 17:10
  • [A-Z] won't match lowercase. I suggest to build the regex step by step. And expand the string as you go, taht way you will find your basic mistakes easily. Commented Mar 1, 2013 at 17:12

2 Answers 2

2

You are matching uppercase A-Z letters only, so the character sequences ohn and oe and com cause the pattern not to match anything.

Adding the re.I case-insensitive flag makes your pattern work:

>>> import re
>>> email_add = "[email protected] <[email protected]>"
>>> re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add)
>>> re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add, re.I)
<_sre.SRE_Match object at 0x1030d4f10>
>>> _.groups()
('[email protected]', '[email protected]')

or you could add a-z to the character classes instead:

>>> re.match(r"(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b) <(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)", email_add)
<_sre.SRE_Match object at 0x1030d4f10>
>>> _.groups()
('[email protected]', '[email protected]')
Sign up to request clarification or add additional context in comments.

1 Comment

D'oh! Excellent! Thank you Martijn :)
2

What's wrong with your regex has been pointed out, but you may also want to consider email.utils.parseaddr:

>>> from email.utils import parseaddr
>>> email_add = "[email protected] <[email protected]>"
>>> parseaddr(email_add)
('', '[email protected]')  # doesn't get first part, so could assume it's same as 2nd?
>>> email_add = "John Doe <[email protected]>"
>>> parseaddr(email_add)
('John Doe', '[email protected]') # does get name and email

1 Comment

Thanks Jon, I wouldn't have easily stumbled across that neat module alone.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.