I have a large json file that I need to read into a pandas dataframe without using the json module. Here is a link to the file melbourne_bike_share.json. I didn't know what to cut out to make a minimal example.
Every way I try to get from reading the file to a dataframe, I get the same error:
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.
I have tried reading it straight in:
import pandas as pd
mbs = pd.read_json('Melbourne_bike_share.json')
and making sure my string to be read by pd.read_json() is correct:
with open("Melbourne_bike_share.json", encoding="utf8") as mbs_json:
mbs_string = mbs_json.readlines()
mbs_string = [line.rstrip() for line in mbs_string]
mbs_string = ''.join(mbs_string)
mbs = pd.read_json('Melbourne_bike_share.json')
But still get the same ValueError. I cannot find what is causing this error, what it really means, or pretty much any asked and answered questions involving reading json files that don't just suggest using the json module, which I cannot do.
I am new to python and json files. From what I gather, the next step after being able to read the json file is to flatten it:
from pandas.io.json import json_normalize
df = json_normalize(mbs)
after which, I will have my dataframe and can manipulate that.
Post Edited to show the expected first line of the resultant dataframe. First line is the column headers (default indexes until I figure out how to pull the column headers from the metadata in the json file). I spaced the column headers 0-5 to align with their values, after that, the format here puts them on a new row. sorry about that. hope you can see that they should all align. Second line is the first row.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 155 7C09387D-9E6C-4B42-9041-9A98B88F54BB 155 1428899388 880594 1453945520 880594 {\n "invalidCells" : {\n "27624917" : "22/... 2 Harbour Town - Docklands Dve - Docklands 60000 9 14 1453985105 [{"address":"","city":"","state":"","zip":""},..
Any help would be appreciated.