55

I want to manipulate the information at THIS url. I can successfully open it and read its contents. But what I really want to do is throw out all the stuff I don't want, and to manipulate the stuff I want to keep.

Is there a way to convert the string into a dict so I can iterate over it? Or do I just have to parse it as is (str type)?

from urllib.request import urlopen

url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = urlopen(url)

print(response.read()) # returns string with info
1
  • The URL may break, better include a representative sample into your question. Commented Feb 2, 2019 at 1:14

6 Answers 6

103

When I printed response.read() I noticed that b was preprended to the string (e.g. b'{"a":1,..). The "b" stands for bytes and serves as a declaration for the type of the object you're handling. Since, I knew that a string could be converted to a dict by using json.loads('string'), I just had to convert the byte type to a string type. I did this by decoding the response to utf-8 decode('utf-8'). Once it was in a string type my problem was solved and I was easily able to iterate over the dict.

I don't know if this is the fastest or most 'pythonic' way of writing this but it works and theres always time later of optimization and improvement! Full code for my solution:

from urllib.request import urlopen
import json

# Get the dataset
url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'
response = urlopen(url)

# Convert bytes to string type and string type to dict
string = response.read().decode('utf-8')
json_obj = json.loads(string)

print(json_obj['source_name']) # prints the string with 'source_name' key
Sign up to request clarification or add additional context in comments.

1 Comment

I ama getting json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) error
29

You can also use python's requests library instead.

import requests

url = 'http://www.quandl.com/api/v1/datasets/FRED/GDP.json'    
response = requests.get(url)    
dict = response.json()

Now you can manipulate the "dict" like a python dictionary.

1 Comment

Be careful, don't override python dict
11

json works with Unicode text in Python 3 (JSON format itself is defined only in terms of Unicode text) and therefore you need to decode bytes received in HTTP response. r.headers.get_content_charset('utf-8') gets your the character encoding:

#!/usr/bin/env python3
import io
import json
from urllib.request import urlopen

with urlopen('https://httpbin.org/get') as r, \
     io.TextIOWrapper(r, encoding=r.headers.get_content_charset('utf-8')) as file:
    result = json.load(file)
print(result['headers']['User-Agent'])

It is not necessary to use io.TextIOWrapper here:

#!/usr/bin/env python3
import json
from urllib.request import urlopen

with urlopen('https://httpbin.org/get') as r:
    result = json.loads(r.read().decode(r.headers.get_content_charset('utf-8')))
print(result['headers']['User-Agent'])

4 Comments

In Python 3, use r.msg.get_content_charset. docs.python.org/3/library/…
@PeppeL-G: from the HTTPResponse source: "headers is used here and supports urllib. msg is provided as a backwards compatibility layer for http clients."
Oh, sorry, I don't have much experience of Python, but you're probably right. I was working with the HTTPResponse class from http.client module, and I now see that there are some differences (this class contains both the msg field and the headers field (same value), but I only found documentation for the msg field, so I assumed headers was kept for backwards compatibility. My mistake.
@PeppeL-G It is probably a bug in the docs because headers is a better name for an attribute that stores HTTP headers than msg. If you think others can have the same issue; you could submit a simple documentation patch, to mention that headers can be used and msg exists for backwards compatibility.
2

TL&DR: When you typically get data from a server, it is sent in bytes. The rationale is that these bytes will need to be 'decoded' by the recipient, who should know how to use the data. You should decode the binary upon arrival to not get 'b' (bytes) but instead a string.

Use case:

import requests    
def get_data_from_url(url):
        response = requests.get(url_to_visit)
        response_data_split_by_line = response.content.decode('utf-8').splitlines()
        return response_data_split_by_line

In this example, I decode the content that I received into UTF-8. For my purposes, I then split it by line, so I can loop through each line with a for loop.

Comments

1

Don't call (me) json... but call the dict function on the headers attribute of the instance of http.client.HTTPResponse, implemented as a http.client.HTTPMessage which is based on email.message.Message.

#!/usr/bin/env python3
import urllib.request


url = 'address'
data = b'key: values',

with urllib.request.urlopen(url, data=data) as rs:
    headers = dict(rs.headers))
    html = rs.read() # binary form
    
print(headers)
#{'Date': 'xyz', 'Server': 'Apache', 'Vary': 'Host,Accept-Encoding', 'Upgrade': 'h2', 'Connection': 'Upgrade, close', 'Accept-Ranges': 'bytes', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'sameorigin', 'Referrer-Policy': 'same-origin', 'strict-transport-security': 'max-age=300', 'Content-Length': '2885', 'Content-Type': 'text/html'}

Depending on the needs, the whole response can be organized as a single dictionary, for example something like

response = {"headers": headers, "body": html}

or a more "flatter" version

response = headers | {"body": html}

Comments

-1

I guess things have changed in python 3.4. This worked for me:

print("resp:" + json.dumps(resp.json()))

1 Comment

there is no json attribute. Don't confuse requests library and urllib.request.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.