2

Here is my json file looks like:

{"File": "xyz.csv", "Line": "0", "Classes": [{"Name": "ABC", "Score": 0.9842}, {"Name": "DEF", "Score": 0.0128}, {"Name": "GHI", "Score": 0.003}]}
{"File": "xyz.csv", "Line": "1", "Classes": [{"Name": "ABC2", "Score": 0.9999}, {"Name": "DEF2", "Score": 0.1111}, {"Name": "GHI2", "Score": 0.5666}]}

pred_df = pd.read_json('filename.json',lines=True)

When I tried to use json_normalize the last column "Classes", it give me an error: string indices must be integers

Class = json_normalize(data = pred_df,
                  record_path= pred_df['Classes'],
                  meta =['Name','Score'])

Pls let me know what I'm missing here....thanks!

1 Answer 1

2

Do this in two steps. The first loads your JSON, the second then flattens your "Classes" column and broadcasts the rest of your data to it using np.repeat.

df = pd.read_json('filename.json', lines=True)

classes = df.pop('Classes')
pd.concat([
    pd.DataFrame(classes.sum()), 
    pd.DataFrame(df.values.repeat(classes.str.len(), axis=0), columns=[*df])
], axis=1)

   Name   Score     File Line
0   ABC  0.9842  xyz.csv    0
1   DEF  0.0128  xyz.csv    0
2   GHI  0.0030  xyz.csv    0
3  ABC2  0.9999  xyz.csv    1
4  DEF2  0.1111  xyz.csv    1
5  GHI2  0.5666  xyz.csv    1

Replace classes.sum() with itertools.chain.from_iterable(classes) if performance is important.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.