1

I have a problem writing the code that will read multiple json files from a folder in Python.

My json file example (file name: 20191111.json ) is like this:

[
  {
    "info1": {
      "name": "John",
      "age" : "50"
      "country": "USA",
    },
    "info2": {
      "id1": "129",
      "id2": "151",
      "id3": "196",
    },
    "region": [
      {
        "id": "36",
        "name": "Spook",
        "spot": "2"
      },
      {
        "id": "11",
        "name": "Ghoul",
        "spot": "6"
      },
      {
        "id": "95",
        "lat": "Devil",
        "spot": "4"
      }
    ]
  }
  {
    "info1": {
      "name": "Mark",
      "age" : "33"
      "country": "Brasil",
    },
    "info2": {
      "id1": "612",
      "id2": "221",
      "id3": "850",
    },
    "region": [
      {
        "id": "68",
        "name": "Ghost",
        "spot": "7"
      },
      {
        "id": "75",
        "name": "Spectrum",
        "spot": "2"
      },
      {
        "id": "53",
        "name": "Phantom",
        "spot": "2"
      }
    ]
  }
]

My code:

path = 'my_files_directory'
json_files = [pos_json for pos_json in os.listdir(path) if pos_json.endswith('.json')]

df = pd.DataFrame()

for file_ in json_files:
    file_df = pd.read_json(file_ )
    file_df['date'] = file_
    df = df.append(file_df)
    df = df.reset_index(drop=True) 

Output:

             info1                    info2                   region                    date   
0 {'name': 'John', ...}    {'id1': '129', ...}    [{'id':'36','name':'Spook'...     20191111.json
1 {'name': 'Mark', ...}    {'id1': '61', ...}     [{'id':'36','name':'Ghost'...     20191111.json

Now I delete the first and second column because there is information that I don't need. Then I want to extract 'name' information from 'region' column

My code is:

df = df.drop(df.columns[[0,1]], axis=1)
df['name'] = [x[0]['name'] for x in df['region']]

Output:

     name           date   
0    Spook     20191111.json
1    Ghost     20191111.json

But I'd like the corresponding DataFrame to look like this:

      name           date  
0    Spook      20191111.json  
1    Ghoul      20191111.json  
2    Devil      20191111.json  
3    Ghost      20191111.json  
4    Spectrum   20191111.json  
5    Phantom    20191111.json  

What I have to do to get it? Thank you for your help.

1
  • Just a note: df = df.append(file_df) in a loop is an anti-pattern. Instead, put all the file_dfs into a list, and then do pd.concat() on that entire list at the end. Much faster. Commented Nov 17, 2019 at 0:42

1 Answer 1

1

This code is affecting your outcome because your dataframe only has two rows:

df['name'] = [x[0]['name'] for x in df['region']]

I changed it to:

filename = '20191111.json'
df1=pd.read_json(filename)
df1 = df1.drop(columns=['info1', 'info2'])  

df2 = pd.DataFrame(columns=['name', 'date'])
names=[]
dates=[]
for x in df1['region']: 
   for name in x: 
     names.append(name['name']) 
     dates.append(filename)
df2['name']=names
df2['date']=dates 

and i get the correct data. you list comprehension can't add more rows than you already are in the data frame, so i created a new one.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.