2

I have a JSON file with multiple dictionaries:

{"team1participants": 
[ {
        "stats": {
            "item1": 3153, 
            "totalScore": 0, 
            ...
        }
   },
   {
        "stats": {
            "item1": 2123, 
            "totalScore": 5, 
            ...
        }
   },
   {
        "stats": {
            "item1": 1253, 
            "totalScore": 1, 
            ...
        }
   }
],
"team2participants": 
[ {
        "stats": {
            "item1": 1853, 
            "totalScore": 2, 
            ...
        }
   },
   {
        "stats": {
            "item1": 21523, 
            "totalScore": 5, 
            ...
        }
   },
   {
        "stats": {
            "item1": 12503, 
            "totalScore": 1, 
            ...
        }
   }
]
}

In other words, the JSON has multiple keys. Each key has a list containing statistics of individual participants.

I have many such JSON files, and I want to extract it to a single CSV file. I can of course do this manually, but this is very tedious. I know of DictWriter, but it seems to work only for single dictionaries. I also know that dictionaries can be concatenated, but it will be problematic because all dictionaries have the same keys.

How can I efficiently extract this to a CSV file?

1 Answer 1

3

You can make your data tidy so that each row is a unique observation.

teams = []
items = []
scores = []
for team in d:
    for item in d[team]:
        teams.append(team)
        items.append(item['stats']['item1'])
        scores.append(item['stats']['totalScore'])


# Using Pandas.
import pandas as pd

df = pd.DataFrame({'team': teams, 'item': items, 'score': scores})
>>> df
    item   score               team
0   1853       2  team2participants
1  21523       5  team2participants
2  12503       1  team2participants
3   3153       0  team1participants
4   2123       5  team1participants
5   1253       1  team1participants

You could also use a list comprehension instead of a loop.

results = [[team, item['stats']['item1'], item['stats']['totalScore']] 
           for team in d for item in d[team]]
df = pd.DataFrame(results, columns=['team', 'item', 'score'])

You can then do a pivot table, for example:

>>> df.pivot_table(values='score ', index='team ', columns='item', aggfunc='sum').fillna(0)
item               1253   1853   2123   3153   12503  21523
team                                                       
team1participants      1      0      5      0      0      0
team2participants      0      2      0      0      1      5

Also, now that it is a dataframe, it is easy to save it as a CSV.

df.to_csv(my_file_name.csv)
Sign up to request clarification or add additional context in comments.

5 Comments

You should probably clarify that you are using the pandas library.
Thanks. If I want to make the four rows into one, should I repeatedly pivot?
@wwl What would you like the result to look like?
the columns should be: team1player1item1, team1player1totalscore, ..., team1player2item1, team1player2totalscore, ..., team2player1item1, team2player1totalscore, ...
I belive you could just transpose the dataframe via df.T

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.