Flatten JSON data using pandas json_normalize

Question

Here is my json file looks like:

{"File": "xyz.csv", "Line": "0", "Classes": [{"Name": "ABC", "Score": 0.9842}, {"Name": "DEF", "Score": 0.0128}, {"Name": "GHI", "Score": 0.003}]}
{"File": "xyz.csv", "Line": "1", "Classes": [{"Name": "ABC2", "Score": 0.9999}, {"Name": "DEF2", "Score": 0.1111}, {"Name": "GHI2", "Score": 0.5666}]}

pred_df = pd.read_json('filename.json',lines=True)

When I tried to use json_normalize the last column "Classes", it give me an error: string indices must be integers

Class = json_normalize(data = pred_df,
                  record_path= pred_df['Classes'],
                  meta =['Name','Score'])

Pls let me know what I'm missing here....thanks!

cs95 · Accepted Answer · 2019-06-18 19:57:19Z

2

Do this in two steps. The first loads your JSON, the second then flattens your "Classes" column and broadcasts the rest of your data to it using np.repeat.

df = pd.read_json('filename.json', lines=True)

classes = df.pop('Classes')
pd.concat([
    pd.DataFrame(classes.sum()), 
    pd.DataFrame(df.values.repeat(classes.str.len(), axis=0), columns=[*df])
], axis=1)

   Name   Score     File Line
0   ABC  0.9842  xyz.csv    0
1   DEF  0.0128  xyz.csv    0
2   GHI  0.0030  xyz.csv    0
3  ABC2  0.9999  xyz.csv    1
4  DEF2  0.1111  xyz.csv    1
5  GHI2  0.5666  xyz.csv    1

Replace classes.sum() with itertools.chain.from_iterable(classes) if performance is important.

edited Jun 18, 2019 at 19:57

answered Jun 18, 2019 at 19:40

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Flatten JSON data using pandas json_normalize

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related