0

I'm a beginner on regex of python

target test.php code:

<html>
  <head></head> 
  <body>
    <a href="www.google.com">[email protected]</a>
    <div>[email protected]</div>
    [email protected]
    [email protected]
  </body>
</html>

This is my code:

import requests,re

email_pattern = re.compile('([\w\-\.]+@(\w[\w\-]+\.)+[\w\-]+)')

res = requests.get("http://127.0.0.1/test.php")

a = email_pattern.findall(res.text)

print a

The result :

[(u'[email protected]', u'com.'), (u'[email protected]', u'com.'), (u'[email protected]', u'gmail.'), (u'[email protected]', u'test.')]

But I want the result like:

[[email protected], [email protected], [email protected], [email protected]]

What wrong in my pattern or code ?

Why the result is multiple list containse extra com , gmail , test ?

Thank you solve my doubts !

5
  • 1
    Because of capturing group, use '([\w\-\.]+@(?:\w[\w\-]+\.)+[\w\-]+)' Commented Feb 20, 2016 at 15:02
  • 1
    See emailregex.com and regular-expressions.info/email.html Commented Feb 20, 2016 at 15:10
  • So my pattern seem like have Unnecessary Parentheses ? Commented Feb 20, 2016 at 15:12
  • Peter Wood , Thanks ! Link is very useful Commented Feb 20, 2016 at 15:13
  • 1
    Highly relevant. Commented Feb 20, 2016 at 15:17

2 Answers 2

2

First rule is that you do never use regexp to parse HTML, it is impossible to do it right!

Once you have a block of text that you want to validate as being and email address, you google and find 2-5 very good regexps on StackOverlfow. RegExps are not python specific.

3rd, you look for a better job, trying to scrap email addresses from websites is not an easy task and everyone here hate those that are spamming us.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your guidance, If I need to know how to defense spamming bot, Learning how to parse is one of way to learn defense. Attack and defense always two sides to one coin.
1

Make the inner group non-capturing:

([\w\-\.]+@(?:\w[\w\-]+\.)+[\w\-]+)
            ^^

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.