find emails in text with python and regex

Question

I am trying to extract emails from text. I used re.search, which returned the 1. occurrence, but then I went on and used re.findall. To my surprise re.findall finds less emails than re.search. What could be the problem?

Code:

searchObj = re.search( r'[A-Za-z0-9\._+-]+@[A-Za-z0-9]+(\.|-)[A-Za-z0-9\.-]+', text)
        if searchObj:
            mail = searchObj.group()
            if mail not in emails:
                emails.add(mail)

listEmails = re.findall( r'[A-Za-z0-9\._+-]+@[A-Za-z0-9]+(\.|-)[A-Za-z0-9\.-]+', text)
        for mail in listEmails:
            if mail not in emails:
                emails.add(mail)

Wiktor Stribiżew · Accepted Answer · 2016-12-28 09:12:54Z

3

Replace the capturing group (\.|-) with a non-capturing one or even with a character class:

r'[A-Za-z0-9._+-]+@[A-Za-z0-9]+[.-][A-Za-z0-9.-]+'
                               ^^^^

Or even shorter:

r'[\w.+-]+@[^\W_]+[.-][A-Za-z0-9.-]+'

Else, re.findall will only return the list of captured values.

Python demo:

import re
rx = r'[\w.+-]+@[^\W_]+[.-][A-Za-z0-9.-]+'
s = '[email protected] and more [email protected]'
print(re.findall(rx, s))
# => ['[email protected]', '[email protected]']

edited Dec 28, 2016 at 9:12

answered Dec 28, 2016 at 9:04

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mohammad Yusuf Over a year ago

Can you negate a group inside a character class. [^(\d+.\d+)]?

Wiktor Stribiżew Over a year ago

No, not that way. The solution depends on what you need to achieve in the end. A tempered greedy token might help.

Collectives™ on Stack Overflow

find emails in text with python and regex

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related