0

I am trying to extract data from a JSON URL into pandas but this file has multiple "layers" of lists and dictionaries which i just cannot seem to navigate.

import json
from urllib.request import urlopen

with urlopen('https://statdata.pgatour.com/r/010/2020/player_stats.json') as response:
    source = response.read()

data = json.loads(source)

for item in data['tournament']['players']:
    pid = item['pid']
    statId = item['stats']['statId']
    name = item['stats']['name']
    tValue = item['stats']['tValue']
    print(pid, statId, name, tValue)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-84-eadd8bdb34cb> in <module>
      1 for item in data['tournament']['players']:
      2     player_id = item['pid']
----> 3     stat_id = item['stats']['statId']
      4     stat_name = item['stats']['name']
      5     stat_value = item['stats']['tValue']

TypeError: list indices must be integers or slices, not str

The output i am trying to get to is like :-

enter image description here

2 Answers 2

1

As the previous answer suggests, stats is a list of stat items. This will show you what happens, and aslo catch any other problems:

import json
from urllib.request import urlopen

with urlopen('https://statdata.pgatour.com/r/010/2020/player_stats.json') as response:
    source = response.read()

data = json.loads(source)

for item in data['tournament']['players']:
    try:
        pid = item['pid']
        stats = item['stats']
        for stat in stats:
            statId = stat['statId']
            name = stat['name']
            tValue = stat['tValue']
            print(pid, statId, name, tValue)
     except Exception as e:
        print(e)
        print(item)
        break
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you pink spikyhairman - one extra question - how could i extract the "tournamentNumber":"010" and add it in first column -- print(tournamentNumber, pid, statId, name, tValue) ?
There is only one tournament in the data so print (data['tournament']['tournamentNumber']) before the for loop
thanks both answers grab the data correctly, how do i get the data into a dataframe ?
1

You are missing a layer.

To simplify the data, we are trying to access:

"stats": [{
    "statId":"106",
    "name":"Eagles",
    "tValue":"0",
}]

The data of 'stats' starts with [{. This is a dictionary within an array.

I think this should work:

for item in data['tournament']['players']:
    pid = item['pid']
    for stat in item['stats']:
        statId = stat['statId']
        name = stat['name']
        tValue = stat['tValue']
        print(pid, statId, name, tValue)

To read more on dictionaries: https://realpython.com/iterate-through-dictionary-python/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.