1

I need to read in a JSON file into pandas df. JSON data looks like this:

{"f0_":{"id":"138307057680","ActionName":"Complete","Time":"2020-04-23-12:40:04"}}
{"f0_":{"id":"138313115245","ActionName":"Midpoint","Time":"2020-06-16-20:41:16"}}

I need to get rid of the first key that holds all columns in it. I tried:

import json 
import pandas as pd 
from pandas.io.json import json_normalize

data_pd = pd.read_json('db/my_file.json', lines=True) 
new_data = json_normalize(data_pd)

The error message is: AttributeError: 'str' object has no attribute 'values' The desired output is:

id                ActionName       Time 
138307057680       Complete       2020-04-23-12:40:04
138313115245       Midpoint       2020-06-16-20:41:16

Currently if I print out data_pd, the output is: enter image description here

2 Answers 2

1

You can try with:

new_data = pd.DataFrame(data_pd['f0_'].values.tolist())

Output:

             id ActionName                 Time
0  138307057680   Complete  2020-04-23-12:40:04
1  138313115245   Midpoint  2020-06-16-20:41:16
Sign up to request clarification or add additional context in comments.

Comments

1

You can clean the data passed to Pandas before generating your dataframe like this example:

import json
import pandas as pd


def gen_data(file_path):
    with open(file_path) as f:
        for line in f.readlines():
            if line:
                line = json.loads(line)
                for value in line.values():
                    yield value


df = pd.DataFrame(gen_data('db/my_file.json'))
print(df)

Output:

             id ActionName                 Time
0  138307057680   Complete  2020-04-23-12:40:04
1  138313115245   Midpoint  2020-06-16-20:41:16

Bonus:

A bit of speed comparaison (I'm using an i7):

If you Clean your data first then generate your DF:

>> %timeit pd.DataFrame(gen_data('db/my_file.json'))
519 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

And if you generate your DF then clean it:

import pandas as pd


def gen_df_method2(file_path):
    data_pd = pd.read_json(file_path, lines=True)
    return pd.DataFrame(data_pd['f0_'].values.tolist())


>> %timeit gen_df_method2('db/my_file.json')
2.66 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.