1

i'm asked to write regular expression which can catch multi-domain email addresses and implement it in python. so i came up with the following regular expression (and code;the emphasis is on the regex though), which i think is correct:

import re
regex = r'\b[\w|\.|-]+@([\w]+\.)+\w{2,4}\b'
input_string = "hey my mail is [email protected]"
match=re.findall(regex,input_string)
print match

now when i run this (using a very simple mail) it doesn't catch it!! instead it shows an empty list as the output. can somebody tell me where did i go wrong in the regular expression literal?

6
  • 3
    Just google this! There's loads of content on it. Even on SO. Commented Jan 7, 2016 at 3:43
  • Possible duplicate of Using a regular expression to validate an email address Commented Jan 7, 2016 at 3:47
  • i know that there are tons of copy-and-paste email regular expressions out there, but this problem is hurting my brain; knowing that the regex is right, yet it's not working. Commented Jan 7, 2016 at 3:48
  • 1
    a) It doesn't output an empty string, it outputs ['def.'] (which is the only bit you're capturing with ()). b) the regex is not right, you can't use | like that inside a character class - inside [] it matches the pipe character literally, it doesn't do either-or, and \b doesn't match at the end of a string, and the regex is broken for addresses like [email protected] which don't have a 2-4 digit TLD. Commented Jan 7, 2016 at 3:51
  • And on top of @TessellatingHeckler's notes, if you have capturing groups, findall returns the capture groups, not the full match. Change ([\w]+\.) to (?:\w+\.) to change the parens to non-capturing (also removing the superfluous but harmless brackets; \w is a character class by itself). Commented Jan 7, 2016 at 3:53

1 Answer 1

1

Here's a simple one to start you off with

regex = r'\b[\w.-]+?@\w+?\.\w+?\b'
re.findall(regex,input_string)  # ['[email protected]']

The problem with your original one is that you don't need the | operator inside a character class ([..]). Just write [\w|\.|-] as [\w.-] (If the - is at the end, you don't need to escape it).

Next there are way too many variations on legitimate domain names. Just look for at least one period surrounded by word characters after the @ symbol:

@\w+?\.\w+?\b
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.