2

I am writing a function using regular expressions for emails. I think I write the pattern correctly, however, I couldn't find out why example 2 '[email protected]' failed to be detected while example 1 worked successfully?

def parse_email(s):
    try:
        pattern = re.compile(r'\b([a-zA-Z])([\w.-_+]+)@([\w.-]+)([a-zA-Z])\b')
        matches = pattern.finditer(s)
        for match in matches:
            print(match.group(0))
            return (match.group(1)+match.group(2), match.group(3)+match.group(4))
    except AttributeError:
        #print('here')
        raise ValueError


print(parse_email('[email protected]'))
print(parse_email('[email protected]'))

Results:

[email protected]
('JKRowling', 'Huge-Books.org')

[email protected]
('much', 'gmail.com')
1
  • 3
    - has special meaning inside [] in a regexp, it's used to specify a range of characters, like a-z. What do you think .-_ matches? Commented Sep 20, 2021 at 17:25

2 Answers 2

1

From re docs:

Ranges of characters can be indicated by giving two characters and separating them by a '-', for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match all the two-digits numbers from 00 to 59, and [0-9A-Fa-f] will match any hexadecimal digit. If - is escaped (e.g. [a-z]) or if it’s placed as the first or last character (e.g. [-a] or [a-]), it will match a literal '-'. [emphasis added]

It looks like you are trying to match a literal -, so place it as the first character of the range, e.g. [-xxx]:

pattern = re.compile(r'\b([a-zA-Z])([-\w._+]+)@([-\w.]+)([a-zA-Z])\b')

Test:

>>> import re
>>> pat = r"\b([a-zA-Z])([-\w._+]+)@([-\w.]+)([a-zA-Z])\b"
>>> old_pattern = re.compile(r'\b([a-zA-Z])([\w.-_+]+)@([\w.-]+)([a-zA-Z])\b')
>>> new_pattern = re.compile(r'\b([a-zA-Z])([-\w._+]+)@([-\w.]+)([a-zA-Z])\b')
>>> old_pattern.search('[email protected]')
<re.Match object; span=(21, 35), match='[email protected]'>
>>> new_pattern.search('[email protected]')
<re.Match object; span=(0, 35), match='[email protected]'>
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! I tried to put it at the front of the group and it worked!
0

Welcome to the wonderful world of regular expressions, where the tiniest of changes can result in totally unexpected outcomes.

To start, let's analyze the regex pattern you have:

r'\b([a-zA-Z])([\w.-_+]+)@([\w.-]+)([a-zA-Z])\b'

  • \b is a correct choice, since you want items which are there own word. Be careful though, since this won't include the beginning or end of a string.
  • ([a-zA-Z]) if your first capture group. You can replace with the simpler ([A-z])
  • ([\w.-_+]+) is your second capture group. It will capture:
    • \w any word character (redundant)
    • . will not necessarily capture the period character, instead, is capturing "any" character
    • - will not capture the dash character, instead, will capture a range of characters
    • _ will indeed capture underscore characters – but in this case, it's being referenced as the end of a range
    • + will not capture plus characters, rather, will get "1 or more" characters from a group or range.

... I'll stop here, since the rest is more or less similar...

You'll want to replace your regex with the following:

r'\b([A-z0-9\-\+]+@[A-z\-\+]+\.[A-z]{3})\b'

  • There is only one capture group, since we want entire email addresses.
  • Email addresses (here) are allowed to contain:
    • Before the at symbol: [A-z0-9\-\+]+ all alpha-numeric characters as well as '-' and '+' characters (as denoted by the escaped characters \- and \+
    • Following the at symbol, a domain name [A-z\-\+] with alpha characters and escaped chars
    • Followed by a domain extension \.[A-z]{3} Ex: .org

Next, you can refactor your code to the following:

import re

pattern = re.compile(r'\b([A-z0-9\-\+]+@[A-z\-\+]+\.[A-z]{3})\b')
match = pattern.search(s)

if match:
   email = match.group()
else:
   email = None

2 Comments

Thank you for carefully analyzing my code. I wonder why you think \w is redundant?
\w is equivalent to [A-z]+ @CodingLife

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.