Why is Pandas Read_JSON returning DataFrame with only one column

Question

I have a json file that is formatted as follows (json_test.json):

{"Col1;Col2;Col3;Col4;Col5":{"0":"value;value;value;value;value","1":"value;value;value;value;value","2":"value;value;value;value;value","N":"value;value;value;value;value"}}

To me, this looks like the orient "columns" that pandas specifies in their documentation: 'columns' : dict like {column -> {index -> value}}

However, running my json through pd.read_json only returns 1 column with 4 rows.

I.e.:

df2 = pd.read_json("data\json_test.json")
df2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, 0 to N
Data columns (total 1 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Col1;Col2;Col3;Col4;Col5  4 non-null      object
dtypes: object(1)
memory usage: 64.0+ bytes

Can anyone help me understand what is going on here, and how to properly read in this json file? I am not really familiar with json and most examples I've seen online are for very standardized json formats.

Thank you!

How did you create the dataframe? By reading values from CSV file? Maybe you forgot to specify ; as separator. — Andrej Kesely
– Andrej Kesely, Commented Jul 29, 2021 at 21:30
@AndrejKesely The default behavior of read_json is to create a dataframe. Otherwise I'm not reading anything from a CSV... — Snowy
– Snowy, Commented Jul 29, 2021 at 21:36
I have no idea, @AndrejKesely -- the json file was given to me as is. — Snowy
– Snowy, Commented Jul 29, 2021 at 21:39
I don't think "Col1;Col2;Col3;Col4;Col5" represents five different columns, since everything is inside double quotes, they'll be treated as a single value for JSON. — ThePyGuy
– ThePyGuy, Commented Jul 29, 2021 at 21:53

Rob Raymond · Accepted Answer · 2021-07-31 09:09:18Z

1

you have JSON as the overall structure
within the JSON keys and values you have semi-colon delimited pairs
this can easily be fully decoded by
1. initailise a data frame with pd.DataFrame() with the JSON
2. expand the delimited keys and values using split(";")
3. convert these lists into pd.Series to then have a dataframe with columns and values

d = {"Col1;Col2;Col3;Col4;Col5":{"0":"value;value;value;value;value","1":"value;value;value;value;value","2":"value;value;value;value;value","N":"value;value;value;value;value"}}
df = pd.DataFrame(d)

df2 = df.iloc[:,0].apply(lambda s: pd.Series(s.split(";"), index=df.columns[0].split(";")))

df2

	Col1	Col2	Col3	Col4	Col5
0	value	value	value	value	value
1	value	value	value	value	value
2	value	value	value	value	value
N	value	value	value	value	value

answered Jul 31, 2021 at 9:09

Rob Raymond

31.5k3 gold badges19 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Why is Pandas Read_JSON returning DataFrame with only one column

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related