1

I am trying to extract emails from text. I used re.search, which returned the 1. occurrence, but then I went on and used re.findall. To my surprise re.findall finds less emails than re.search. What could be the problem?

Code:

searchObj = re.search( r'[A-Za-z0-9\._+-]+@[A-Za-z0-9]+(\.|-)[A-Za-z0-9\.-]+', text)
        if searchObj:
            mail = searchObj.group()
            if mail not in emails:
                emails.add(mail)

listEmails = re.findall( r'[A-Za-z0-9\._+-]+@[A-Za-z0-9]+(\.|-)[A-Za-z0-9\.-]+', text)
        for mail in listEmails:
            if mail not in emails:
                emails.add(mail)

1 Answer 1

3

Replace the capturing group (\.|-) with a non-capturing one or even with a character class:

r'[A-Za-z0-9._+-]+@[A-Za-z0-9]+[.-][A-Za-z0-9.-]+'
                               ^^^^ 

Or even shorter:

r'[\w.+-]+@[^\W_]+[.-][A-Za-z0-9.-]+'

Else, re.findall will only return the list of captured values.

Python demo:

import re
rx = r'[\w.+-]+@[^\W_]+[.-][A-Za-z0-9.-]+'
s = '[email protected] and more [email protected]'
print(re.findall(rx, s))
# => ['[email protected]', '[email protected]']
Sign up to request clarification or add additional context in comments.

2 Comments

Can you negate a group inside a character class. [^(\d+.\d+)]?
No, not that way. The solution depends on what you need to achieve in the end. A tempered greedy token might help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.