4

For the following lines that use urllib:

# some request object exists
response = urllib.request.urlopen(request)
html = response.read().decode("utf8")

What format of string does read() return? I've been trying t figure that out form Python's documentation but it does not mention it at all. Why is there a decode? Does decode decode an object to utf-8 or from utf-8? From what format to what format does it decode it to? decode documentation also mentions nothing about that. Is it that Python's documentation is that terrible, or is it that I don't understand some standard convention?

I want to store that HTML in a UTF-8 file. Would I just do a regular write, or do I need to "encode" back into something and write that?

Note: I know urllib is deprecated, but I cannot switch to urllib2 right now

2
  • 1
    Thanks for down votes without a comment...? Commented Mar 16, 2013 at 20:33
  • 3
    How do I stop the pain? Commented Mar 16, 2013 at 20:35

1 Answer 1

1

Ask python:

>>> r=urllib.urlopen("http://google.com")
>>> a=r.read()
>>> type(a)
0: <type 'str'>
>>> help(a.decode)
Help on built-in function decode:

decode(...)
    S.decode([encoding[,errors]]) -> object

    Decodes S using the codec registered for encoding. encoding defaults
    to the default encoding. errors may be given to set a different error
    handling scheme. Default is 'strict' meaning that encoding errors raise
    a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
    as well as any other name registered with codecs.register_error that is
    able to handle UnicodeDecodeErrors.

>>> b = a.decode('utf8')
>>> type(b)
1: <type 'unicode'>
>>> 

So, it appears that read() returns an str. .decode() decodes from UTF-8 to Python's internal unicode format.

Sign up to request clarification or add additional context in comments.

2 Comments

For some reason, the decode() doc page I was on was a different one. Thanks
So a str does not support all unicode characters, thus decode() chained after read()?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.