0

im having a bit of trouble with this code as it is not working how i intend it. i know regular expressions arent the best way to fo this but i couldnt figure out how to do it with the html parser and beautiful soup isnt an option. heres what im trying to do. i have an html file and i need to extract the value between

<div class="e_mail"> and </div>

when i use the below code however it returns the email address as such:

['[email protected]']

how can i get the email address without the brackets and quotes? id rather use something cleaner than reg but as i said couldnt figure out the html parser.

f=urllib.urlopen('results.html')
s = str(f.read())
return re.compile('<div class="e_mail">(.*?)</div>', re.DOTALL).findall(s)
1
  • that worked great. i was trying to do that but was going about it all wrong. i know RE isnt the way to do this but i don't really need anything better. thanks again. Commented Nov 15, 2012 at 22:36

2 Answers 2

1

Do

return re.compile(expr, re.DOTALL).findall(s)[0]

Alternatively:

return re.findall(r'<div class="e_mail">(.*?)</div>', s, re.DOTALL)[0]

Note that if there are no results, you'll get an IndexError because re.findall will simply return an empty list.

Sign up to request clarification or add additional context in comments.

Comments

0

This may work for you:

f=urllib.urlopen('results.html')
s = str(f.read())
email = re.compile('<div class="e_mail">(.*?)</div>', re.DOTALL).findall(s)
return email[0]

Also make sure it is not an empty list before returning it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.