Python Regex: Backreference a matching regex group

Question

I am trying to return 2 subgroups from my regex match:

email_add = "[email protected] <[email protected]>"
m = re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add)

But it doesn't seem to match:

>>> m.group()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

I suspect I probably did not group it correctly or I'm using incorrect word boundary. I tried \w instead of \b but the result is the same.

Could someone please point out my errors.

Your regex doesn't match the string. You need case-insensitive matching. — nhahtdh
– nhahtdh, Commented Mar 1, 2013 at 17:10
[A-Z] won't match lowercase. I suggest to build the regex step by step. And expand the string as you go, taht way you will find your basic mistakes easily. — ted
– ted, Commented Mar 1, 2013 at 17:12

Martijn Pieters · Accepted Answer · 2013-03-01 17:12:56Z

2

You are matching uppercase A-Z letters only, so the character sequences ohn and oe and com cause the pattern not to match anything.

Adding the re.I case-insensitive flag makes your pattern work:

>>> import re
>>> email_add = "[email protected] <[email protected]>"
>>> re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add)
>>> re.match(r"(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b) <(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)", email_add, re.I)
<_sre.SRE_Match object at 0x1030d4f10>
>>> _.groups()
('[email protected]', '[email protected]')

or you could add a-z to the character classes instead:

>>> re.match(r"(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b) <(\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b)", email_add)
<_sre.SRE_Match object at 0x1030d4f10>
>>> _.groups()
('[email protected]', '[email protected]')

answered Mar 1, 2013 at 17:12

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dirty Penguin Over a year ago

D'oh! Excellent! Thank you Martijn :)

Jon Clements · Accepted Answer · 2013-03-01 17:19:13Z

2

What's wrong with your regex has been pointed out, but you may also want to consider email.utils.parseaddr:

>>> from email.utils import parseaddr
>>> email_add = "[email protected] <[email protected]>"
>>> parseaddr(email_add)
('', '[email protected]')  # doesn't get first part, so could assume it's same as 2nd?
>>> email_add = "John Doe <[email protected]>"
>>> parseaddr(email_add)
('John Doe', '[email protected]') # does get name and email

answered Mar 1, 2013 at 17:19

Jon Clements

143k34 gold badges254 silver badges288 bronze badges

1 Comment

Dirty Penguin Over a year ago

Thanks Jon, I wouldn't have easily stumbled across that neat module alone.

Collectives™ on Stack Overflow

Python Regex: Backreference a matching regex group

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related