Python Response Decoding

Question

For the following lines that use urllib:

# some request object exists
response = urllib.request.urlopen(request)
html = response.read().decode("utf8")

What format of string does read() return? I've been trying t figure that out form Python's documentation but it does not mention it at all. Why is there a decode? Does decode decode an object to utf-8 or from utf-8? From what format to what format does it decode it to? decode documentation also mentions nothing about that. Is it that Python's documentation is that terrible, or is it that I don't understand some standard convention?

I want to store that HTML in a UTF-8 file. Would I just do a regular write, or do I need to "encode" back into something and write that?

Note: I know urllib is deprecated, but I cannot switch to urllib2 right now

Thanks for down votes without a comment...?

darksky
– darksky

2013-03-16 20:33:51 +00:00
Commented Mar 16, 2013 at 20:33 — darksky
– darksky, Commented Mar 16, 2013 at 20:33
How do I stop the pain?

root
– root

2013-03-16 20:35:14 +00:00
Commented Mar 16, 2013 at 20:35 — root
– root, Commented Mar 16, 2013 at 20:35

Robᵩ · Accepted Answer · 2013-03-16 20:36:16Z

1

Ask python:

>>> r=urllib.urlopen("http://google.com")
>>> a=r.read()
>>> type(a)
0: <type 'str'>
>>> help(a.decode)
Help on built-in function decode:

decode(...)
    S.decode([encoding[,errors]]) -> object

    Decodes S using the codec registered for encoding. encoding defaults
    to the default encoding. errors may be given to set a different error
    handling scheme. Default is 'strict' meaning that encoding errors raise
    a UnicodeDecodeError. Other possible values are 'ignore' and 'replace'
    as well as any other name registered with codecs.register_error that is
    able to handle UnicodeDecodeErrors.

>>> b = a.decode('utf8')
>>> type(b)
1: <type 'unicode'>
>>>

So, it appears that read() returns an str. .decode() decodes from UTF-8 to Python's internal unicode format.

answered Mar 16, 2013 at 20:36

Robᵩ

170k20 gold badges251 silver badges323 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

darksky Over a year ago

For some reason, the decode() doc page I was on was a different one. Thanks

darksky Over a year ago

So a str does not support all unicode characters, thus decode() chained after read()?

Collectives™ on Stack Overflow

Python Response Decoding

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related