2

I'm trying to write an HTML parser using Python's html.parser.HTMLParser function and have some questions.

I'm defining a parsing class as follows:

class MyHTMLParser(HTMLParser):

    def __init__(self):
        self.messages = []

    def handle_starttag(self, tag, attrs):
        message = "Encountered a start tag: %s" % tag
        print(message)
        self.messages.append(message)

    def handle_endtag(self, tag):
        message = "Encountered an end tag: %s" % tag
        print(message)
        self.messages.append(message)

    def handle_data(self, data):
        message = "Encountered some data: %s" % data
        print(message)
        self.messages.append(message)

parser = MyHTMLParser()
html_parser.feed("<html><head><title>Test</title></head>")
print(html_parser.messages)

and I want to store the results of the handle_data() function but cannot get handle_data() to return anything other than None, and when I try to store the results of handle_* in the self.message attribute I get the following error:

Traceback (most recent call last): File "./parse_html.py", line 33, in html_parser.feed("Test") File "/opt/local/depot/python/3.6.4/lib/python3.6/html/parser.py", line 110, in feed self.rawdata = self.rawdata + data AttributeError: 'MyHTMLParser' object has no attribute 'rawdata'

I could always make "messages" into a global variable but I'm looking for another way of storing the results of the "handle_*" functions. What's the recommended way of retrieving the list of all the elements found by the handle_data() call?

Thank you for any hints,

Catherine

1
  • You need to call the base's constructor Commented Jan 15, 2021 at 1:41

1 Answer 1

1

My own stupid mistake ...

I forgot the HTMLParser.init(self) line in the class initialization. It should look like:

class MyHTMLParser(HTMLParser):

    def __init__(self):
        HTMLParser.__init__(self)
        self.messages = []

    def handle_starttag(self, tag, attrs):
        message = "Encountered a start tag: %s" % tag
        print(message)
        self.messages.append(message)

    def handle_endtag(self, tag):
        message = "Encountered an end tag: %s" % tag
        print(message)
        self.messages.append(message)

    def handle_data(self, data):
        message = "Encountered some data: %s" % data
        print(message)
        self.messages.append(message)

parser = MyHTMLParser()
html_parser.feed("<html><head><title>Test</title></head>")
print(html_parser.messages)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.