Reading multiple nested JSON files into Pandas DataFrame

Question

I have a problem writing the code that will read multiple json files from a folder in Python.

My json file example (file name: 20191111.json ) is like this:

[
  {
    "info1": {
      "name": "John",
      "age" : "50"
      "country": "USA",
    },
    "info2": {
      "id1": "129",
      "id2": "151",
      "id3": "196",
    },
    "region": [
      {
        "id": "36",
        "name": "Spook",
        "spot": "2"
      },
      {
        "id": "11",
        "name": "Ghoul",
        "spot": "6"
      },
      {
        "id": "95",
        "lat": "Devil",
        "spot": "4"
      }
    ]
  }
  {
    "info1": {
      "name": "Mark",
      "age" : "33"
      "country": "Brasil",
    },
    "info2": {
      "id1": "612",
      "id2": "221",
      "id3": "850",
    },
    "region": [
      {
        "id": "68",
        "name": "Ghost",
        "spot": "7"
      },
      {
        "id": "75",
        "name": "Spectrum",
        "spot": "2"
      },
      {
        "id": "53",
        "name": "Phantom",
        "spot": "2"
      }
    ]
  }
]

My code:

path = 'my_files_directory'
json_files = [pos_json for pos_json in os.listdir(path) if pos_json.endswith('.json')]

df = pd.DataFrame()

for file_ in json_files:
    file_df = pd.read_json(file_ )
    file_df['date'] = file_
    df = df.append(file_df)
    df = df.reset_index(drop=True)

Output:

             info1                    info2                   region                    date   
0 {'name': 'John', ...}    {'id1': '129', ...}    [{'id':'36','name':'Spook'...     20191111.json
1 {'name': 'Mark', ...}    {'id1': '61', ...}     [{'id':'36','name':'Ghost'...     20191111.json

Now I delete the first and second column because there is information that I don't need. Then I want to extract 'name' information from 'region' column

My code is:

df = df.drop(df.columns[[0,1]], axis=1)
df['name'] = [x[0]['name'] for x in df['region']]

Output:

     name           date   
0    Spook     20191111.json
1    Ghost     20191111.json

But I'd like the corresponding DataFrame to look like this:

      name           date  
0    Spook      20191111.json  
1    Ghoul      20191111.json  
2    Devil      20191111.json  
3    Ghost      20191111.json  
4    Spectrum   20191111.json  
5    Phantom    20191111.json

What I have to do to get it? Thank you for your help.

Just a note: df = df.append(file_df) in a loop is an anti-pattern. Instead, put all the file_dfs into a list, and then do pd.concat() on that entire list at the end. Much faster. — John Zwinck
– John Zwinck, Commented Nov 17, 2019 at 0:42

oppressionslayer · Accepted Answer · 2019-11-17 05:57:56Z

1

This code is affecting your outcome because your dataframe only has two rows:

df['name'] = [x[0]['name'] for x in df['region']]

I changed it to:

filename = '20191111.json'
df1=pd.read_json(filename)
df1 = df1.drop(columns=['info1', 'info2'])  

df2 = pd.DataFrame(columns=['name', 'date'])
names=[]
dates=[]
for x in df1['region']: 
   for name in x: 
     names.append(name['name']) 
     dates.append(filename)
df2['name']=names
df2['date']=dates

and i get the correct data. you list comprehension can't add more rows than you already are in the data frame, so i created a new one.

answered Nov 17, 2019 at 5:57

oppressionslayer

7,2242 gold badges11 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Reading multiple nested JSON files into Pandas DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related