Scrapy and some others libraries in python start to write and read the json lines format for json files :
I try to convert json files using json lines specification to panda dataframe using the read_json(...) function :
My file "input.json" is similar to that, one line by capture :
{"A": {"page": 1, "name": "foo", "url": "xxx"}, "B": {"page": 1, "name": "bar", "url": "http://xxx"}, "C": {"page": 3, "name": "foo", "url": "http://xxx"}}
{"D": {"page": 2, "name": "bar", "url": "xxx"}, "E": {"page": 2, "name": "bar", "url": "http://xxx"}, "F": {"page": 3, "name": "foo", "url": "http://xxx"}}
What i want on output :
page name url
A 1 foo http://xxx
B 1 bar http://xxx
C 3 foo http://xxx
D 2 bar http://xxx
E 2 bar http://xxx
F 3 boo http://xxx
In first intention, i try to use this, but the result is not correct :
print( pd.read_json("file:///input.json", orient='index', lines=True))
I see that orient='index' in the panda doc use this specification {index -> {column -> value}} But the result produced show that i don't understand something :
0 1
A {'page': 1, 'url': 'xxx', 'name': 'foo'} NaN
B {'page': 1, 'url': 'http://xxx', 'name': 'bar'} NaN
C {'page': 3, 'url': 'http://xxx', 'name': 'foo'} NaN
D NaN {'page': 2, 'url': 'xxx', 'name': 'bar'}
E NaN {'page': 2, 'url': 'http://xxx', 'name': 'bar'}
F NaN {'page': 3, 'url': 'http://xxx', 'name': 'foo'}