0

I have this little code and it's giving me AttributeError: 'NoneType' object has no attribute 'group'.

import sys
import re

#def extract_names(filename):

f = open('name.html', 'r')
text = f.read()

match = re.search (r'<hgroup><h1>(\w+)</h1>', text)
second = re.search (r'<li class="hover">Employees: <b>(\d+,\d+)</b></li>', text)  

outf = open('details.txt', 'a')
outf.write(match)
outf.close()

My intention is to read a .HTML file looking for the <h1> tag value and the number of employees and append them to a file. But for some reason I can't seem to get it right. Your help is greatly appreciated.

4
  • I think that some higher level libraries like Scrapy of Beautiful Soap would fit better your task than regular expressions. Commented Sep 20, 2012 at 13:15
  • 2
    @larsmans: the myriad others also include this one which actually demonstrates how it is possible to parse HTML with regexes. And Helen's task right here is teeny-tiny compared to that. So not so trigger-happy. Commented Sep 20, 2012 at 15:28
  • 1
    It’s a shame you can’t use vi to edit HTML files, innit? Commented Sep 20, 2012 at 15:54
  • If you show the relevant portions of the HTML file you are looking at, it would help a great deal. Commented Sep 21, 2012 at 18:02

2 Answers 2

6

You are using a regular expression, but matching XML with such expressions gets too complicated, too fast. Don't do that.

Use a HTML parser instead, Python has several to choose from:

The latter two handle malformed HTML quite gracefully as well, making decent sense of many a botched website.

ElementTree example:

from xml.etree import ElementTree

tree = ElementTree.parse('filename.html')
for elem in tree.findall('h1'):
    print ElementTree.tostring(elem)
Sign up to request clarification or add additional context in comments.

1 Comment

Better use BeatifulSoup or lxml.html for HTML files, though -- more often than not, they're malformed XML.
1

Just for the sake of completion: your error message just indicate that your regular expression failed and did not return anything...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.