Python Load Json Lines to Pandas Dataframe

Question

stackoverflow please do your magic...

i have json file format like this.

{"_id":{"$oid":"5a8fda432467a7bb10swec"},"code":4321,"phone":{"$numberLong":"32323232"},"name":"Batako","fax":{"$numberLong":"12345678"}}
{"_id":{"$oid":"7dhds9ds9dsa9dsa9sdsds"},"code":3212,"phone":"","name":"Franco","fax":0}
{"_id":{"$oid":"6dhds9dadssa9dsa9sdsds"},"code":5612,"phone":"6483737","name":"Brescia","fax":"123-232-1331"}
{"_id":{"$oid":"8dshds9ds9dsa9dsa9sdsds"},"code":4312,"phone":{"$numberLong":"9453737"},"name":"Kalon","fax":{"$numberLong":"65543434"}}

how to create dataframe with pandas??

i have been try like this.

import pandas as pd
import json
data = []
for line in open(r'file.json', 'r', encoding='utf-8'):
    data.append(json.loads(line))
    
df = pd.json_normalize(data)
df.head()

but getting error

JSONDecodeError: Expecting value: line 1 column 133 (char 132)

Andrej Kesely · Accepted Answer · 2020-10-24 11:12:29Z

4

If your file contains on every line json string and some values are dictionaries with only one values, you can try this example to load it to dataframe:

df = pd.read_json('<your file>', lines=True)

def unpack(x):
    rv = []
    for v in x:
        if isinstance(v, dict):
            rv.append([*v.values()][0])
        else:
            rv.append(v)
    return rv

df = df.apply(unpack)
print(df)

Prints:

                      _id  code     phone    name       fax
0  5a8fda432467a7bb10swec  4321  32323232  Batako  12345678
1  7dhds9ds9dsa9dsa9sdsds  3212   9283737  Franco  65543434

EDIT: To ignore lines where json throws an error, you can use this example:

import json

data = []
with open('a1.txt', 'r') as f_in:
    for line in f_in:
        try:
            data.append(json.loads(line))
        except:
            continue    # ignore the error

df = pd.DataFrame(data)

def unpack(x):
    rv = []
    for v in x:
        if isinstance(v, dict):
            rv.append([*v.values()][0])
        else:
            rv.append(v)
    return rv

df = df.apply(unpack)
print(df)

Prints:

                       _id  code     phone     name           fax
0   5a8fda432467a7bb10swec  4321  32323232   Batako      12345678
1   7dhds9ds9dsa9dsa9sdsds  3212             Franco             0
2   6dhds9dadssa9dsa9sdsds  5612   6483737  Brescia  123-232-1331
3  8dshds9ds9dsa9dsa9sdsds  4312   9453737    Kalon      65543434

edited Oct 24, 2020 at 11:12

answered Oct 24, 2020 at 10:04

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Hendra Over a year ago

thank you @Andrej but, i still getting error ValueError: Expected object or value

Andrej Kesely Over a year ago

@Hendra Can you paste the whole error stack-trace? It seems your input file is different from that in your question.

Hendra Over a year ago

hi @Andrej, like this? C:\miniconda3\envs\belajar\lib\site-packages\pandas\io\json_json.py in _parse_no_numpy(self) 1117 if orient == "columns": 1118 self.obj = DataFrame( -> 1119 loads(json, precise_float=self.precise_float), dtype=None 1120 ) 1121 elif orient == "split": ValueError: Expected object or value

Hendra Over a year ago

my file is large, 6GB, 12Million rows. i think my data is bad structure. how to handle error in loop for in your code? i test your code is work with sample data.

Collectives™ on Stack Overflow

Python Load Json Lines to Pandas Dataframe

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related