2

I scraped an election site for the state of Pennsylvania in the United States and here's a sample of the resulting nested dictionary from the site's json:

some_dict = {'Election': {'Statewide': [{'ADAMS': [{'CandidateName': 'BIDEN, JOSEPH '
                                                     'ROBINETTE JR',
                                    'CountyName': 'ADAMS',
                                    'ElectionDayNoVotes': '0',
                                    'ElectionDayVotes': '1',
                                    'ElectionDayYesVotes': '0',
                                    'ElectionYear': '2020'
                                    },
                                   {'CandidateName': 'TRUMP, DONALD J. ',
                                    'CountyName': 'ADAMS',
                                    'ElectionDayNoVotes': '0',
                                    'ElectionDayVotes': '1',
                                    'ElectionDayYesVotes': '0',
                                    'ElectionYear': '2020'
                                    }],
                         'ALLEGHENY': [{'CandidateName': 'BIDEN, JOSEPH '
                                                         'ROBINETTE JR',
                                        'CountyName': 'ALLEGHENY',
                                        'ElectionDayNoVotes': '0',
                                        'ElectionDayVotes': '1',
                                        'ElectionDayYesVotes': '0',
                                        'ElectionYear': '2020'
                                       },
                                       {'CandidateName': 'TRUMP, DONALD '
                                                         'J. ',
                                        'CountyName': 'ALLEGHENY',
                                        'ElectionDayNoVotes': '0',
                                        'ElectionDayVotes': '1',
                                        'ElectionDayYesVotes': '0',
                                        'ElectionYear': '2020'
                                       }]}]}}

I can't figure out how to turn it into a dataframe that would look like this:

enter image description here

2 Answers 2

1
import pandas as pd

some_dict = {'Election': {'Statewide': [{'ADAMS': [{'CandidateName': 'BIDEN, JOSEPH '
                                                     'ROBINETTE JR',
                                    'CountyName': 'ADAMS',
                                    'ElectionDayNoVotes': '0',
                                    'ElectionDayVotes': '1',
                                    'ElectionDayYesVotes': '0',
                                    'ElectionYear': '2020'
                                    },
                                   {'CandidateName': 'TRUMP, DONALD J. ',
                                    'CountyName': 'ADAMS',
                                    'ElectionDayNoVotes': '0',
                                    'ElectionDayVotes': '1',
                                    'ElectionDayYesVotes': '0',
                                    'ElectionYear': '2020'
                                    }],
                         'ALLEGHENY': [{'CandidateName': 'BIDEN, JOSEPH '
                                                         'ROBINETTE JR',
                                        'CountyName': 'ALLEGHENY',
                                        'ElectionDayNoVotes': '0',
                                        'ElectionDayVotes': '1',
                                        'ElectionDayYesVotes': '0',
                                        'ElectionYear': '2020'
                                       },
                                       {'CandidateName': 'TRUMP, DONALD '
                                                         'J. ',
                                        'CountyName': 'ALLEGHENY',
                                        'ElectionDayNoVotes': '0',
                                        'ElectionDayVotes': '1',
                                        'ElectionDayYesVotes': '0',
                                        'ElectionYear': '2020'
                                       }]}]}}


df = pd.DataFrame()
for d in some_dict['Election']['Statewide']:
    for k,v in d.items():
        t = pd.DataFrame(v)
        t['CountyName'] = k
        df = pd.concat([df,t])
Sign up to request clarification or add additional context in comments.

1 Comment

Amazing. Will accept as answer when that option opens. thx
0

Solution

You can do it in any of the following two methods:

  • Method-1: using pd.read_json()
  • Method-2: using pd.DataFrame() The .DataFrame() method accepts
    • a single dict

      The keys are the column names and the values are the column-values.

    • a list of dicts

      Each of the list items is a row of the dataframe, presented as a dict: the keys are the column names and values the column values for that specific row.

Code

Here, we are using the list of dicts approach to create the dataframe. First, we transform the data into a list of dicts using the custom function prepare_records() and then apply any of the following two methods.

# prepare records
records = prepare_records(data)

# Method-1: using read_json()
import json
df = pd.read_json(json.dumps(records), orient='records')

# Method-2: using DataFrame()
df = pd.DataFrame(data=records)

Output:

# print(df.to_markdown(index=False))

| CandidateName              | CountyName   |   ElectionDayNoVotes |   ElectionDayVotes |   ElectionDayYesVotes |   ElectionYear |
|:---------------------------|:-------------|---------------------:|-------------------:|----------------------:|---------------:|
| BIDEN, JOSEPH ROBINETTE JR | ADAMS        |                    0 |                  1 |                     0 |           2020 |
| TRUMP, DONALD J.           | ADAMS        |                    0 |                  1 |                     0 |           2020 |
| BIDEN, JOSEPH ROBINETTE JR | ALLEGHENY    |                    0 |                  1 |                     0 |           2020 |
| TRUMP, DONALD J.           | ALLEGHENY    |                    0 |                  1 |                     0 |           2020 |

Custom Function

# custom function
def prepare_records(data):
    records = []
    for county in data['Election']['Statewide'][0].values(): 
        records.extend(county) # same as: records += county
    return records

Dummy Data

data = {
    'Election': 
        {'Statewide': [
            {
                'ADAMS': [
                    {
                        'CandidateName': 'BIDEN, JOSEPH ROBINETTE JR',
                        'CountyName': 'ADAMS',
                        'ElectionDayNoVotes': '0',
                        'ElectionDayVotes': '1',
                        'ElectionDayYesVotes': '0',
                        'ElectionYear': '2020'
                    },
                    {
                        'CandidateName': 'TRUMP, DONALD J.',
                        'CountyName': 'ADAMS',
                        'ElectionDayNoVotes': '0',
                        'ElectionDayVotes': '1',
                        'ElectionDayYesVotes': '0',
                        'ElectionYear': '2020'
                    },
                ],
                'ALLEGHENY': [
                    {
                        'CandidateName': 'BIDEN, JOSEPH ROBINETTE JR',
                        'CountyName': 'ALLEGHENY',
                        'ElectionDayNoVotes': '0',
                        'ElectionDayVotes': '1',
                        'ElectionDayYesVotes': '0',
                        'ElectionYear': '2020'
                    },
                    {
                        'CandidateName': 'TRUMP, DONALD J.',
                        'CountyName': 'ALLEGHENY',
                        'ElectionDayNoVotes': '0',
                        'ElectionDayVotes': '1',
                        'ElectionDayYesVotes': '0',
                        'ElectionYear': '2020'
                    },
                ],
            },
        ],
    }
}

1 Comment

@SMJune Please take a look at this. Let me know if you have any questions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.