Creating a DataFrame from a JSON file in Python

Question

I have some json with multiple blocks with this format (using only one here to make it simple, so in this example the dataframe would have only one line):

{
      "A": 1,
      "B": {
        "C": [
          {
            "D": 2,
            "E": 3
          }
        ],
        "F": {
          "G": 4,
          "H": 5
        }
      }
}

And I want to create a DataFrame like this:

  A B.C.D B.C.E B.F.G B.F.H
1 1   2     3     4     5

When I try to do

with open('some.json') as file:
    data = json.load(file)

df = pd.json_normalize(data)

I get something like this:

  A        B.C          B.F.G  B.F.H
1 1  [{"D":2,"E":3}]      4      5

So... I can get the column B.C, break it into the B.C.D and B.C.E

df2 = pd.DataFrame(df['B.C'].tolist())
df3 = df2[0].apply(pd.Series) #The [0] here is the only way to work when I have more than one block in the json

Them later concatenate with the previous dataframe (and removing the B.C column) but it feels ugly and since I'm doing this a LOT I was thinking if there's a cleaner/faster way.

Well, thanks in advance!

Welcome to SO; if the answer resolved your issue, kindly accept it - see What should I do when someone answers my question? — sushanth
– sushanth, Commented Jul 27, 2020 at 18:44

Sayandip Dutta · Accepted Answer · 2020-07-25 07:01:25Z

2

I guess you could write a recursive solution to preprocess the data. There might be some existing inbuilt solution but I am not aware of it. You can test the performance of the following:

def clean(data):
    clean_data = {}
    def parse(data, key=''):
        if isinstance(data, list):
            for elem in data:
                parse(elem,key=key)
        else:
            for k, v in data.items():
                if isinstance(v, (dict, list)):
                    parse(v, key=key+'.'+k)
                else:
                    clean_data[(key+'.'+k).lstrip('.')] = v
    parse(data)
    return [clean_data]

data = {'A': 1, 'B': {'C': [{'D': 2, 'E': 3}], 'F': {'G': 4, 'H': 5}}}
print(pd.DataFrame(clean(data)))

Output:

   A  B.C.D  B.C.E  B.F.G  B.F.H
0  1      2      3      4      5

edited Jul 25, 2020 at 7:01

answered Jul 25, 2020 at 6:34

Sayandip Dutta

15.9k4 gold badges27 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jan · Accepted Answer · 2020-07-25 06:58:31Z

Write yourself a recursive function:

def get_values(iterable, root=""):
    if isinstance(iterable, dict):
        for key, values in iterable.items():
            if isinstance(values, (dict, list)):
                new_root = "{}.{}".format(root, key) if root else key
                yield from get_values(values, new_root)
            else:
                absolute_key = "{}.{}".format(root, key) if root else key
                yield absolute_key, values
    elif isinstance(iterable, list):
        for dct in iterable:
            yield from get_values(dct, root)

result = [item for item in get_values(data)]
print(result)

Which yields

[('A', 1), ('B.C.D', 2), ('B.C.E', 3), ('B.F.G', 4), ('B.F.H', 5)]

To transform it into a DataFrame, use:

result = dict([item for item in get_values(data)])

import pandas as pd
df = pd.DataFrame(result, index=[0])
print(df)

Which then yields

   A  B.C.D  B.C.E  B.F.G  B.F.H
0  1      2      3      4      5

NYC Coder · Accepted Answer · 2020-07-25 13:34:17Z

1

You should checkout flatten JSON. It's the best way to flatten JSON's with multiple record paths

import flatten_json

with open('1.json', 'r+') as f:
    data = json.load(f)
dic_flattened = flatten_json.flatten(data)
df = pd.json_normalize(dic_flattened)
print(df)

   A  B_C_0_D  B_C_0_E  B_F_G  B_F_H
0  1        2        3      4      5

answered Jul 25, 2020 at 13:34

NYC Coder

7,6443 gold badges14 silver badges25 bronze badges

Collectives™ on Stack Overflow

Creating a DataFrame from a JSON file in Python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related