0

I have a large json file that I need to read into a pandas dataframe without using the json module. Here is a link to the file melbourne_bike_share.json. I didn't know what to cut out to make a minimal example.

Every way I try to get from reading the file to a dataframe, I get the same error:

ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.

I have tried reading it straight in:

import pandas as pd
mbs = pd.read_json('Melbourne_bike_share.json')

and making sure my string to be read by pd.read_json() is correct:

with open("Melbourne_bike_share.json", encoding="utf8") as mbs_json:
    mbs_string = mbs_json.readlines()
    mbs_string = [line.rstrip() for line in mbs_string]
    mbs_string = ''.join(mbs_string)

mbs = pd.read_json('Melbourne_bike_share.json')

But still get the same ValueError. I cannot find what is causing this error, what it really means, or pretty much any asked and answered questions involving reading json files that don't just suggest using the json module, which I cannot do.

I am new to python and json files. From what I gather, the next step after being able to read the json file is to flatten it:

from pandas.io.json import json_normalize    
df = json_normalize(mbs)

after which, I will have my dataframe and can manipulate that.

Post Edited to show the expected first line of the resultant dataframe. First line is the column headers (default indexes until I figure out how to pull the column headers from the metadata in the json file). I spaced the column headers 0-5 to align with their values, after that, the format here puts them on a new row. sorry about that. hope you can see that they should all align. Second line is the first row.

    0   1                                       2   3           4       5    6  7   8   9   10  11  12  13  14
0   155 7C09387D-9E6C-4B42-9041-9A98B88F54BB    155 1428899388  880594  1453945520  880594  {\n "invalidCells" : {\n "27624917" : "22/...   2   Harbour Town - Docklands Dve - Docklands    60000   9   14  1453985105  [{"address":"","city":"","state":"","zip":""},..

Any help would be appreciated.

3
  • Do you have an example of how you want the output to look Commented May 20, 2018 at 18:08
  • This data appears to be very deeply nested with multiple lists. I am not sure if it was designed for simple consumption into pandas. The csv format of this data is available at data.gov.au/dataset/… Commented May 20, 2018 at 18:19
  • I edited it and put in an example of the output. yes, this json file was provided as a first look at what json files are and how to parse them. I agree it seems overly complicated, especially given we are not allowed to use the json module. Commented May 20, 2018 at 18:29

1 Answer 1

1

I think you are looking for something like this:

import pandas as pd

df = pd.read_json('Melbourne_bike_share.json', typ='series')
pd.DataFrame(data=df['data'])

0   1   2   3   4   5   6   7   8   9   10  11  12  13  14
0   155 7C09387D-9E6C-4B42-9041-9A98B88F54BB    155 1428899388  880594  1453945520  880594  {\n "invalidCells" : {\n "27624917" : "22/...   2   Harbour Town - Docklands Dve - Docklands    60000   9   14  1453985105  [{"address":"","city":"","state":"","zip":""},...
1   156 52739A59-E034-436B-A613-E7A5F62448C0    156 1428899388  880594  1453945520  880594  {\n "invalidCells" : {\n "27624917" : "22/...   4   Federation Square - Flinders St / Swanston St ...   60001   11  11  1453985105  [{"address":"","city":"","state":"","zip":""},...
2   157 7EFB5219-2764-47CE-A497-B6D872823BBE    157 1428899388  880594  1453945520  880594  {\n "invalidCells" : {\n "27624917" : "22/...   5   Plum Garland Reserve - Beaconsfield Pde - Albe...   60002   16  1   1453985105  [{"address":"","city":"","state":"","zip":""},...
3   158 4B09D743-FFEE-4185-B968-A7866E45FE0B    158 1428899388  880594  1453945520  880594  {\n "invalidCells" : {\n "27624917" : "22/...   6   State Library - Swanston St / Little Lonsdale ...   60003   9   2   1453985105  [{"address":"","city":"","state":"","zip":""},...
4   159 C5493850-5714-40E2-9A67-DA443A4F64F8    159 1428899388  880594  1453945520  880594  {\n "invalidCells" : {\n "27624917" : "22/...   7   Bourke Street Mall - 205 Bourke St - City   60004   10  1   1453985105  [{"address":"","city":"","state":"","zip":""},...

if you want to add the column names then do something like this:

import pandas as pd

mbs = pd.read_json('Melbourne_bike_share.json', typ='series')
mbs_columns = mbs['meta']['view']['columns']
names = [item['name'] for item in mbs_columns]
df = pd.DataFrame(data=mbs['data'],columns=names)
Sign up to request clarification or add additional context in comments.

2 Comments

This works!! Thankyou!! ok, 1) I should have used the typ = 'Series'. 2) when you are defining "mbs_columns" I understand why you are including ['meta'] and [view'], but not ['columns'].
Because we want the value for the item 'name' in key 'columns'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.