3

stackoverflow please do your magic...

i have json file format like this.

{"_id":{"$oid":"5a8fda432467a7bb10swec"},"code":4321,"phone":{"$numberLong":"32323232"},"name":"Batako","fax":{"$numberLong":"12345678"}}
{"_id":{"$oid":"7dhds9ds9dsa9dsa9sdsds"},"code":3212,"phone":"","name":"Franco","fax":0}
{"_id":{"$oid":"6dhds9dadssa9dsa9sdsds"},"code":5612,"phone":"6483737","name":"Brescia","fax":"123-232-1331"}
{"_id":{"$oid":"8dshds9ds9dsa9dsa9sdsds"},"code":4312,"phone":{"$numberLong":"9453737"},"name":"Kalon","fax":{"$numberLong":"65543434"}}

how to create dataframe with pandas??

i have been try like this.

import pandas as pd
import json
data = []
for line in open(r'file.json', 'r', encoding='utf-8'):
    data.append(json.loads(line))
    
df = pd.json_normalize(data)
df.head()

but getting error

JSONDecodeError: Expecting value: line 1 column 133 (char 132)

1 Answer 1

4

If your file contains on every line json string and some values are dictionaries with only one values, you can try this example to load it to dataframe:

df = pd.read_json('<your file>', lines=True)

def unpack(x):
    rv = []
    for v in x:
        if isinstance(v, dict):
            rv.append([*v.values()][0])
        else:
            rv.append(v)
    return rv

df = df.apply(unpack)
print(df)

Prints:

                      _id  code     phone    name       fax
0  5a8fda432467a7bb10swec  4321  32323232  Batako  12345678
1  7dhds9ds9dsa9dsa9sdsds  3212   9283737  Franco  65543434

EDIT: To ignore lines where json throws an error, you can use this example:

import json

data = []
with open('a1.txt', 'r') as f_in:
    for line in f_in:
        try:
            data.append(json.loads(line))
        except:
            continue    # ignore the error

df = pd.DataFrame(data)

def unpack(x):
    rv = []
    for v in x:
        if isinstance(v, dict):
            rv.append([*v.values()][0])
        else:
            rv.append(v)
    return rv

df = df.apply(unpack)
print(df)

Prints:

                       _id  code     phone     name           fax
0   5a8fda432467a7bb10swec  4321  32323232   Batako      12345678
1   7dhds9ds9dsa9dsa9sdsds  3212             Franco             0
2   6dhds9dadssa9dsa9sdsds  5612   6483737  Brescia  123-232-1331
3  8dshds9ds9dsa9dsa9sdsds  4312   9453737    Kalon      65543434
Sign up to request clarification or add additional context in comments.

4 Comments

thank you @Andrej but, i still getting error ValueError: Expected object or value
@Hendra Can you paste the whole error stack-trace? It seems your input file is different from that in your question.
hi @Andrej, like this? C:\miniconda3\envs\belajar\lib\site-packages\pandas\io\json_json.py in _parse_no_numpy(self) 1117 if orient == "columns": 1118 self.obj = DataFrame( -> 1119 loads(json, precise_float=self.precise_float), dtype=None 1120 ) 1121 elif orient == "split": ValueError: Expected object or value
my file is large, 6GB, 12Million rows. i think my data is bad structure. how to handle error in loop for in your code? i test your code is work with sample data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.