Parsing HTTP Response in Python

Question

I want to manipulate the information at THIS url. I can successfully open it and read its contents. But what I really want to do is throw out all the stuff I don't want, and to manipulate the stuff I want to keep.

Is there a way to convert the string into a dict so I can iterate over it? Or do I just have to parse it as is (str type)?

from urllib.request import urlopen

url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = urlopen(url)

print(response.read()) # returns string with info

The URL may break, better include a representative sample into your question. — Nicolas Raoul
– Nicolas Raoul, Commented Feb 2, 2019 at 1:14

bvrakvs · Accepted Answer · 2019-05-30 01:19:31Z

103

When I printed response.read() I noticed that b was preprended to the string (e.g. b'{"a":1,..). The "b" stands for bytes and serves as a declaration for the type of the object you're handling. Since, I knew that a string could be converted to a dict by using json.loads('string'), I just had to convert the byte type to a string type. I did this by decoding the response to utf-8 decode('utf-8'). Once it was in a string type my problem was solved and I was easily able to iterate over the dict.

I don't know if this is the fastest or most 'pythonic' way of writing this but it works and theres always time later of optimization and improvement! Full code for my solution:

from urllib.request import urlopen
import json

# Get the dataset
url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = urlopen(url)

# Convert bytes to string type and string type to dict
string = response.read().decode('utf-8')
json_obj = json.loads(string)

print(json_obj['source_name']) # prints the string with 'source_name' key

edited May 30, 2019 at 1:19

bvrakvs

14.7k4 gold badges55 silver badges96 bronze badges

answered Apr 15, 2014 at 1:32

Colton Allen

3,0703 gold badges27 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

alper Over a year ago

I ama getting json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) error

Paul Roub · Accepted Answer · 2017-08-01 21:14:11Z

29

You can also use python's requests library instead.

import requests

url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'    
response = requests.get(url)    
dict = response.json()

Now you can manipulate the "dict" like a python dictionary.

edited Aug 1, 2017 at 21:14

Paul Roub

36.5k27 gold badges88 silver badges95 bronze badges

answered Aug 1, 2017 at 21:00

Shaurya Mittal

8241 gold badge9 silver badges20 bronze badges

1 Comment

user14136989 Over a year ago

Be careful, don't override python dict

Community · Accepted Answer · 2017-05-23 11:47:30Z

11

json works with Unicode text in Python 3 (JSON format itself is defined only in terms of Unicode text) and therefore you need to decode bytes received in HTTP response. r.headers.get_content_charset('utf-8') gets your the character encoding:

#!/usr/bin/env python3
import io
import json
from urllib.request import urlopen

with urlopen('https://httpbin.org/get') as r, \
     io.TextIOWrapper(r, encoding=r.headers.get_content_charset('utf-8')) as file:
    result = json.load(file)
print(result['headers']['User-Agent'])

It is not necessary to use io.TextIOWrapper here:

#!/usr/bin/env python3
import json
from urllib.request import urlopen

with urlopen('https://httpbin.org/get') as r:
    result = json.loads(r.read().decode(r.headers.get_content_charset('utf-8')))
print(result['headers']['User-Agent'])

edited May 23, 2017 at 11:47

CommunityBot

11 silver badge

answered Oct 31, 2015 at 16:09

jfs

417k210 gold badges1k silver badges1.7k bronze badges

4 Comments

Peppe L-G Over a year ago

In Python 3, use r.msg.get_content_charset. docs.python.org/3/library/…

jfs Over a year ago

@PeppeL-G: from the HTTPResponse source: "headers is used here and supports urllib. msg is provided as a backwards compatibility layer for http clients."

Peppe L-G Over a year ago

Oh, sorry, I don't have much experience of Python, but you're probably right. I was working with the HTTPResponse class from http.client module, and I now see that there are some differences (this class contains both the msg field and the headers field (same value), but I only found documentation for the msg field, so I assumed headers was kept for backwards compatibility. My mistake.

jfs Over a year ago

@PeppeL-G It is probably a bug in the docs because headers is a better name for an attribute that stores HTTP headers than msg. If you think others can have the same issue; you could submit a simple documentation patch, to mention that headers can be used and msg exists for backwards compatibility.

FlyingV · Accepted Answer · 2020-05-18 16:43:38Z

2

TL&DR: When you typically get data from a server, it is sent in bytes. The rationale is that these bytes will need to be 'decoded' by the recipient, who should know how to use the data. You should decode the binary upon arrival to not get 'b' (bytes) but instead a string.

Use case:

import requests    
def get_data_from_url(url):
        response = requests.get(url_to_visit)
        response_data_split_by_line = response.content.decode('utf-8').splitlines()
        return response_data_split_by_line

In this example, I decode the content that I received into UTF-8. For my purposes, I then split it by line, so I can loop through each line with a for loop.

answered May 18, 2020 at 16:43

FlyingV

3,9552 gold badges26 silver badges21 bronze badges

Comments

cards · Accepted Answer · 2024-08-26 23:44:19Z

Don't call (me) json... but call the dict function on the headers attribute of the instance of http.client.HTTPResponse, implemented as a http.client.HTTPMessage which is based on email.message.Message.

#!/usr/bin/env python3
import urllib.request


url = 'address'
data = b'key: values',

with urllib.request.urlopen(url, data=data) as rs:
    headers = dict(rs.headers))
    html = rs.read() # binary form
    
print(headers)
#{'Date': 'xyz', 'Server': 'Apache', 'Vary': 'Host,Accept-Encoding', 'Upgrade': 'h2', 'Connection': 'Upgrade, close', 'Accept-Ranges': 'bytes', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'sameorigin', 'Referrer-Policy': 'same-origin', 'strict-transport-security': 'max-age=300', 'Content-Length': '2885', 'Content-Type': 'text/html'}

Depending on the needs, the whole response can be organized as a single dictionary, for example something like

response = {"headers": headers, "body": html}

or a more "flatter" version

response = headers | {"body": html}

Ajay Gautam · Accepted Answer · 2015-10-30 18:44:32Z

-1

I guess things have changed in python 3.4. This worked for me:

print("resp:" + json.dumps(resp.json()))

answered Oct 30, 2015 at 18:44

Ajay Gautam

1,02313 silver badges16 bronze badges

1 Comment

jfs Over a year ago

there is no json attribute. Don't confuse requests library and urllib.request.

Collectives™ on Stack Overflow

Parsing HTTP Response in Python

6 Answers 6

1 Comment

1 Comment

4 Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

1 Comment

1 Comment

4 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related