0

Here is my code, It can only convert part of the JSON file, it fails to flatten all JSON,Unable to convert all files

import pandas as pd
import json
all_data = []
add_header = True
with open('C:\\Users\\jeri\\Desktop\\1.json',encoding='utf-8') as f_json:
    for line in f_json:
        line = line.strip()
        if line:
            all_data.append(json.loads(line))
df = pd.json_normalize(all_data)
df.to_csv('C:\\Users\\jeri\\Desktop\\11.csv', index=False,encoding='utf-8',header=add_header)
add_header = False

my json file


{"id":"aa","sex":"male","name":[{"Fn":"jeri","Ln":"teri"}],"age":45,"info":[{"address":{"State":"NY","City":"new york"},"start_date":"2001-09","title":{"name":"Doctor","Exp":"head"},"year":"2001","month":"05"}],"other":null,"Hobby":[{"smoking":null,"gamble":null}],"connect":[{"phone":"123456789","email":"[email protected]"}],"Education":"MBA","School":{"State":"NY","City":"new york"}}
{"id":"aa","sex":"female","name":[{"Fn":"lo","Ln":"li"}],"age":34,"info":[{"address":{"State":"NY","City":"new york"},"start_date":"2008-11","title":{"name":"Doctor","Exp":"hand"},"year":"2008","month":"02"}],"other":null,"Hobby":[{"smoking":null,"gamble":null}],"connect":[{"phone":"123456789","email":"[email protected]"}],"Education":"MBA","School":{"State":"NY","City":"new york"}}

The result of the conversion is below,Not all json files are converted,this is not what i want,I need to flatten and convert all files,

id,sex,name,age,info,other,Hobby,connect,Education,School.State,School.City
aa,male,"[{'Fn': 'jeri', 'Ln': 'teri'}]",45,"[{'address': {'State': 'NY', 'City': 'new york'}, 'start_date': '2001-09', 'title': {'name': 'Doctor', 'Exp': 'head'}, 'year': '2001', 'month': '05'}]",,"[{'smoking': None, 'gamble': None}]","[{'phone': '123456789', 'email': '[email protected]'}]",MBA,NY,new york
aa,female,"[{'Fn': 'lo', 'Ln': 'li'}]",34,"[{'address': {'State': 'NY', 'City': 'new york'}, 'start_date': '2008-11', 'title': {'name': 'Doctor', 'Exp': 'hand'}, 'year': '2008', 'month': '02'}]",,"[{'smoking': None, 'gamble': None}]","[{'phone': '123456789', 'email': '[email protected]'}]",MBA,NY,new york


new code

import pandas as pd
import json
data = []
add_header = True
with open('C:\\Users\\jeri\\Desktop\\1.json',encoding='utf-8') as f_json:
    for line in f_json:
        line = line.strip()
        if line:
            data.append(json.loads(line))
df = pd.json_normalize(data)
dfe = explode('name').explode('info').explode("Hobby"),pd.concat([df, pd.json_normalize(df.name),
                                                                  pd.json_normalize(df.info),
                                                                  pd.json_normalize(df.Hobby)], axis=1)
dfe.to_csv('C:\\Users\\jeri\\Desktop\\11.csv', index=False,encoding='utf-8',header=add_header)
add_header = False


output

id,sex,age,other,Education,School,Fn,Ln,start_date,year,month,address.State,address.City,title.name,title.Exp,phone,email,smoking,gamble
aa,male,45,,MBA,"{'State': 'NY', 'City': 'new york'}",jeri,teri,2001-09,2001,05,NY,new york,Doctor,head,123456789,[email protected],,
aa,female,34,,MBA,"{'State': 'NY', 'City': 'new york'}",lo,li,2008-11,2008,02,NY,new york,Doctor,hand,123456789,[email protected],,

16
  • 1
    import pandas pd pd.json_normalize(data) Commented Feb 7, 2022 at 15:33
  • yes, but it doesn't convert all json files, just parts Commented Feb 7, 2022 at 15:39
  • Please, show what the desired output is. Commented Feb 7, 2022 at 15:58
  • One way is to flatten a deeply nested json. then run pd.to_csv on that flattened json. Commented Feb 7, 2022 at 16:03
  • 1
    Can you provide the full code, if it works, I'll accept it as an answer and click to help @Emma Commented Feb 8, 2022 at 2:25

1 Answer 1

2

pd.json_normalize flattens the dictionary to columns. When you have list, you need to explode the list that transform the list into rows. If you have dictionary inside the list, you need to apply the json_normalize again on the exploded column.

import pandas as pd

data = []
# read_json normalizes the top layer.
df_it = pd.read_json('C:\\Users\\jeri\\Desktop\\1.json', encoding='utf-8', lines=True, chunksize=100000)
for sub in df_it:
    data.append(sub)

df = pd.concat(data)

# You just need to flatten the list of dict additionally.
# If you are using pandas >= 1.3.0, df = df.explode(['name', 'info', 'Hobby'])
# Otherwise, explode only takes 1 column at a time.
df = df.explode('name').explode('info').explode("Hobby")
df = pd.concat([df.reset_index(drop=True), 
                pd.json_normalize(df.name), 
                # info is reserved keyword (dataframe function name, you cannot use df.info), 
                # change it to accessing by df['info']
                pd.json_normalize(df['info']), 
                pd.json_normalize(df.Hobby)], axis=1)
# Drop the exploded column, the contents inside the column is extracted to columns already
df = df.drop(['name', 'info', 'Hobby'], axis=1)

df.to_csv('C:\\Users\\jeri\\Desktop\\11.csv', index=False, encoding='utf-8')

Another thing to consider is when you have multiple json_normalize, you need to make sure that key in each dict column is not shared amongst the other columns you try to json_normalize. In your sample, I didn't see any of same key name, so I did concat and json_normalize in 1 shot.

However, if you have same key name in dict, you need to take step by step. Do json_normalize on 1 exploded column and rename column(s), then do another json_normalize on another exploded column.

Sign up to request clarification or add additional context in comments.

6 Comments

I see you are asking about this type of question multiple times. I added the link for the functions that is useful in your operations, so take a look at the link to know the usage and what does the function do. Also, it would be good to look at the dataframe after each line or each function or try executing separately, and see what each function do to the current dataframe.
I ran the code but I got an error,,AttributeError: 'DataFrame' object has no attribute 'name',You can also see the full info in the question,@Emma
answer updated.
I see, I run it with no errors, but it doesn't convert all objects because school isn't converted. @Emma
Yes, I am not doing all. I gave you the idea, I think you can apply it to school now. I hope you are not just pasting the entire code and bringing back the exact error without debugging/working out from it. It might sound impolite but I hope you understand I am putting my time on this and I am hoping you are getting an educational value out from this.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.