5

I have been given an url and I want to extract the contents of the <BODY> tag from the url. I'm using Python3. I came across sgmllib but it is not available for Python3.

Can someone please guide me with this? Can I use HTMLParser for this?

Here is what i tried:

import urllib.request
f=urllib.request.urlopen("URL")
s=f.read()

from html.parser import HTMLParser
class MyHTMLParser(HTMLParser):
    def handle_data(self, data):
        print("Encountered   some data:", data)

parser = MyHTMLParser()
parser.feed(s)

this gives me error : TypeError: Can't convert 'bytes' object to str implicitly

9
  • 8
    "please guide me": Will do. Search. It's been asked. Many, many times. After you do the search (in the upper right corner), feel free to ask specific questions based on the answers already given. Commented Feb 1, 2012 at 20:11
  • to be specific can we parse url in parser.feed() method? Commented Feb 1, 2012 at 20:15
  • @ghbhatt: show us a example of what you need. Otherwise see my answer is this what you are asking. Commented Feb 1, 2012 at 20:16
  • @RanRag: I did edit my question. please have a look at it. Commented Feb 1, 2012 at 20:37
  • 1
    Have you done a search? Commented Feb 1, 2012 at 20:43

2 Answers 2

10

To fix the TypeError change line #3 to

s = str(f.read())

The web page you're getting is being returned in the form of bytes, and you need to change the bytes into a string to feed them to the parser.

Sign up to request clarification or add additional context in comments.

1 Comment

You should find the encoding from the HTTP headers so you know what encoding to use.
4

If you take a look at your s variable its type is byte.

>>> type(s)
<class 'bytes'>

and if you take a look at Parser.feed it requires a string or unicode as an argument.So,do

>>> x = s.decode('utf-8')
>>> type(x)
<class 'str'>
>>> parser.feed(x)

or do x = str(s).

2 Comments

It seems that we gave the same answer with in a minute of each other.
You should find the encoding from the HTTP headers so you know what encoding to use.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.