Reading a JSON File into a dataframe WITHOUT using the json module

Question

I have a large json file that I need to read into a pandas dataframe without using the json module. Here is a link to the file melbourne_bike_share.json. I didn't know what to cut out to make a minimal example.

Every way I try to get from reading the file to a dataframe, I get the same error:

ValueError: Mixing dicts with non-Series may lead to ambiguous ordering.

I have tried reading it straight in:

import pandas as pd
mbs = pd.read_json('Melbourne_bike_share.json')

and making sure my string to be read by pd.read_json() is correct:

with open("Melbourne_bike_share.json", encoding="utf8") as mbs_json:
    mbs_string = mbs_json.readlines()
    mbs_string = [line.rstrip() for line in mbs_string]
    mbs_string = ''.join(mbs_string)

mbs = pd.read_json('Melbourne_bike_share.json')

But still get the same ValueError. I cannot find what is causing this error, what it really means, or pretty much any asked and answered questions involving reading json files that don't just suggest using the json module, which I cannot do.

I am new to python and json files. From what I gather, the next step after being able to read the json file is to flatten it:

from pandas.io.json import json_normalize    
df = json_normalize(mbs)

after which, I will have my dataframe and can manipulate that.

Post Edited to show the expected first line of the resultant dataframe. First line is the column headers (default indexes until I figure out how to pull the column headers from the metadata in the json file). I spaced the column headers 0-5 to align with their values, after that, the format here puts them on a new row. sorry about that. hope you can see that they should all align. Second line is the first row.

    0   1                                       2   3           4       5    6  7   8   9   10  11  12  13  14
0   155 7C09387D-9E6C-4B42-9041-9A98B88F54BB    155 1428899388  880594  1453945520  880594  {\n "invalidCells" : {\n "27624917" : "22/...   2   Harbour Town - Docklands Dve - Docklands    60000   9   14  1453985105  [{"address":"","city":"","state":"","zip":""},..

Any help would be appreciated.

This data appears to be very deeply nested with multiple lists. I am not sure if it was designed for simple consumption into pandas. The csv format of this data is available at data.gov.au/dataset/… — user2510479
– user2510479, Commented May 20, 2018 at 18:19
I edited it and put in an example of the output. yes, this json file was provided as a first look at what json files are and how to parse them. I agree it seems overly complicated, especially given we are not allowed to use the json module. — AJ AJ
– AJ AJ, Commented May 20, 2018 at 18:29

It_is_Chris · Accepted Answer · 2018-05-21 03:00:18Z

1

I think you are looking for something like this:

import pandas as pd

df = pd.read_json('Melbourne_bike_share.json', typ='series')
pd.DataFrame(data=df['data'])

0   1   2   3   4   5   6   7   8   9   10  11  12  13  14
0   155 7C09387D-9E6C-4B42-9041-9A98B88F54BB    155 1428899388  880594  1453945520  880594  {\n "invalidCells" : {\n "27624917" : "22/...   2   Harbour Town - Docklands Dve - Docklands    60000   9   14  1453985105  [{"address":"","city":"","state":"","zip":""},...
1   156 52739A59-E034-436B-A613-E7A5F62448C0    156 1428899388  880594  1453945520  880594  {\n "invalidCells" : {\n "27624917" : "22/...   4   Federation Square - Flinders St / Swanston St ...   60001   11  11  1453985105  [{"address":"","city":"","state":"","zip":""},...
2   157 7EFB5219-2764-47CE-A497-B6D872823BBE    157 1428899388  880594  1453945520  880594  {\n "invalidCells" : {\n "27624917" : "22/...   5   Plum Garland Reserve - Beaconsfield Pde - Albe...   60002   16  1   1453985105  [{"address":"","city":"","state":"","zip":""},...
3   158 4B09D743-FFEE-4185-B968-A7866E45FE0B    158 1428899388  880594  1453945520  880594  {\n "invalidCells" : {\n "27624917" : "22/...   6   State Library - Swanston St / Little Lonsdale ...   60003   9   2   1453985105  [{"address":"","city":"","state":"","zip":""},...
4   159 C5493850-5714-40E2-9A67-DA443A4F64F8    159 1428899388  880594  1453945520  880594  {\n "invalidCells" : {\n "27624917" : "22/...   7   Bourke Street Mall - 205 Bourke St - City   60004   10  1   1453985105  [{"address":"","city":"","state":"","zip":""},...

if you want to add the column names then do something like this:

import pandas as pd

mbs = pd.read_json('Melbourne_bike_share.json', typ='series')
mbs_columns = mbs['meta']['view']['columns']
names = [item['name'] for item in mbs_columns]
df = pd.DataFrame(data=mbs['data'],columns=names)

edited May 21, 2018 at 3:00

answered May 21, 2018 at 2:23

It_is_Chris

14.2k3 gold badges27 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

AJ AJ Over a year ago

This works!! Thankyou!! ok, 1) I should have used the typ = 'Series'. 2) when you are defining "mbs_columns" I understand why you are including ['meta'] and [view'], but not ['columns'].

It_is_Chris Over a year ago

Because we want the value for the item 'name' in key 'columns'

Collectives™ on Stack Overflow

Reading a JSON File into a dataframe WITHOUT using the json module

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related