0

I have a json file which looks like this:

    "Aveiro": {
        "Albergaria-a-Velha": {
            "candidates": [
                {
                    "effectiveCandidates": [
                        "JOSÉ OLIVEIRA SANTOS"
                    ],
                    "party": "B.E.",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "B.E.",
                        "constituenctyCounter": 1,
                        "mandates": 0,
                        "percentage": 1.34,
                        "presidents": 0,
                        "validVotesPercentage": 1.4,
                        "votes": 179
                    }
                },
                {
                    "effectiveCandidates": [
                        "ANTÓNIO AUGUSTO AMARAL LOUREIRO E SANTOS"
                    ],
                    "party": "CDS-PP",
                    "votes": {
                        "absoluteMajority": 1,
                        "acronym": "CDS-PP",
                        "constituenctyCounter": 1,
                        "mandates": 5,
                        "percentage": 59.7,
                        "presidents": 1,
                        "validVotesPercentage": 62.5,
                        "votes": 7970
                    }
                },
                {
                    "effectiveCandidates": [
                        "CARLOS MANUEL DA COSTA SERVEIRA VASQUES"
                    ],
                    "party": "CH",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "CH",
                        "constituenctyCounter": 1,
                        "mandates": 0,
                        "percentage": 1.87,
                        "presidents": 0,
                        "validVotesPercentage": 1.95,
                        "votes": 249
                    }
                },
                {
                    "effectiveCandidates": [
                        "RODRIGO MANUEL PEREIRA MARQUES LOURENÇO"
                    ],
                    "party": "PCP-PEV",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "PCP-PEV",
                        "constituenctyCounter": 1,
                        "mandates": 0,
                        "percentage": 1.57,
                        "presidents": 0,
                        "validVotesPercentage": 1.65,
                        "votes": 210
                    }
                },
                {
                    "effectiveCandidates": [
                        "DELFINA LISBOA MARTINS DA CUNHA"
                    ],
                    "party": "PPD/PSD",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "PPD/PSD",
                        "constituenctyCounter": 1,
                        "mandates": 2,
                        "percentage": 24.23,
                        "presidents": 0,
                        "validVotesPercentage": 25.37,
                        "votes": 3235
                    }
                },
                {
                    "effectiveCandidates": [
                        "JESUS MANUEL VIDINHA TOMÁS"
                    ],
                    "party": "PS",
                    "votes": {
                        "absoluteMajority": 0,
                        "acronym": "PS",
                        "constituenctyCounter": 1,
                        "mandates": 0,
                        "percentage": 6.82,
                        "presidents": 0,
                        "validVotesPercentage": 7.14,
                        "votes": 910
                    }
                }
            ],
            "parentTerritoryName": "Aveiro",
            "territoryKey": "LOCAL-010200",
            "territoryName": "Albergaria-a-Velha",
            "total_votes": {
                "availableMandates": 0,
                "blankVotes": 377,
                "blankVotesPercentage": 2.82,
                "displayMessage": null,
                "hasNoVoting": false,
                "nullVotes": 221,
                "nullVotesPercentage": 1.66,
                "numberParishes": 6,
                "numberVoters": 13351,
                "percentageVoters": 59.48
            }
        },

The full file is here for reference

I thought that this code would work

import pandas as pd 
from pandas import json_normalize
import json


with open('autarquicas_2021.json') as f:
    data = json.load(f)

df = pd.json_normalize(data)

However this is returning the following:

df.head()
Aveiro.Albergaria-a-Velha.candidates  ... Évora.Évora.total_votes.percentageVoters
0  [{'effectiveCandidates': ['JOSÉ OLIVEIRA SANTO...  ...                                    49.84

[1 rows x 4312 columns]

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Columns: 4312 entries, Aveiro.Albergaria-a-Velha.candidates to Évora.Évora.total_votes.percentageVoters
dtypes: bool(308), float64(924), int64(1540), object(1540)
memory usage: 31.7+ KB
None

For some reason the code is not working, and my research has led me to no solutions, as it seems that every json file has a mind of its own.

Any help would be much appreciated. Thank you in advance!

Disclaimer: This is for an open source project to bring more transparency into local elections in Portugal. It will not be used for commercial, or for profit projects.

5
  • 1
    how do you want the data to look like? Commented Sep 30, 2021 at 14:40
  • Does pd.read_json("your_json_file.json") work? Commented Sep 30, 2021 at 14:43
  • @HaleemurAli I should've posted that as well. Here you can see an example of that the objective is docs.google.com/spreadsheets/d/e/… Commented Sep 30, 2021 at 15:08
  • @zabop pd.read_json returns ``` Aveiro Açores Beja Braga Bragança ... Setúbal Viana do Castelo Vila Real Viseu Évora Albergaria-a-Velha {'candidates': [{'effectiveCandidates': ['JOSÉ... NaN NaN NaN NaN ... NaN NaN NaN NaN NaN``` with a df.info() returning [2 rows x 20 columns] <class 'pandas.core.frame.DataFrame'> Index: 306 entries, Albergaria-a-Velha to Évora Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Aveiro 19 non-null object Commented Sep 30, 2021 at 15:11
  • So, I used an online tool and this is the resulting CSV file: docs.google.com/spreadsheets/d/e/… Commented Sep 30, 2021 at 15:33

2 Answers 2

1

You can use json_normalize with a little transformation of original JSON format.

  1. Convert JSON into list format. I am assuming "Aveiro" as city, and "Albergaria-a-Velha" as district. Apologies of my unfamiliarity of the area, so if it is wrong, please rename the key.
res = [{**z, **{'city': x, 'district': y}} for x, y in data.items() for y, z in y.items()]

This will transform original JSON of key-values style into list of objects.

[{
    "city": "Aveiro",
    "district": "Albergaria-a-Velha",
    "candidates": [{
        ...
}]
  1. Then use json_normalize.
df = pd.json_normalize(res, record_path=['candidates'], meta=['total_votes', 'city', 'district'])
  1. Further expanding the nested object total_votes.
df = pd.concat([df, pd.json_normalize(df['total_votes'])], axis=1)
>>> df.iloc[0]
effectiveCandidates                                      [JOSÉ OLIVEIRA SANTOS]
party                                                                      B.E.
votes.absoluteMajority                                                        0
votes.acronym                                                              B.E.
votes.constituenctyCounter                                                    1
votes.mandates                                                                0
votes.percentage                                                           1.34
votes.presidents                                                              0
votes.validVotesPercentage                                                  1.4
votes.votes                                                                 179
total_votes                   {'availableMandates': 0, 'blankVotes': 377, 'b...
city                                                                     Aveiro
district                                                     Albergaria-a-Velha
availableMandates                                                             0
blankVotes                                                                  377
blankVotesPercentage                                                       2.82
displayMessage                                                             None
hasNoVoting                                                               False
nullVotes                                                                   221
nullVotesPercentage                                                        1.66
numberParishes                                                                6
numberVoters                                                              13351
percentageVoters                                                          59.48
Name: 0, dtype: object
Sign up to request clarification or add additional context in comments.

Comments

0

Recursive Approach:

I usually use this function (a recursive approach) to do that kind of thing:

# Function for flattening 
# json
def flatten_json(y):
    out = {}
  
    def flatten(x, name =''):
          
        # If the Nested key-value 
        # pair is of dict type
        if type(x) is dict:
              
            for a in x:
                flatten(x[a], name + a + '_')
                  
        # If the Nested key-value
        # pair is of list type
        elif type(x) is list:
              
            i = 0
              
            for a in x:                
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x
  
    flatten(y)
    return out

You can call flatten_json for flattening your nested json.

# Driver code
print(flatten_json(data))

Library-based approach:

from flatten_json import flatten

unflat_json = {'user' :
               {'foo':
                {'UserID':0123456,
                'Email': '[email protected]', 
                'friends': ['Johnny', 'Mark', 'Tom']
                }
               }
              }
  
flat_json = flatten(unflat_json)
  
print(flat_json)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.