1

I have a json file that is formatted as follows (json_test.json):

{"Col1;Col2;Col3;Col4;Col5":{"0":"value;value;value;value;value","1":"value;value;value;value;value","2":"value;value;value;value;value","N":"value;value;value;value;value"}}

To me, this looks like the orient "columns" that pandas specifies in their documentation: 'columns' : dict like {column -> {index -> value}}

However, running my json through pd.read_json only returns 1 column with 4 rows.

I.e.:

df2 = pd.read_json("data\json_test.json")
df2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, 0 to N
Data columns (total 1 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Col1;Col2;Col3;Col4;Col5  4 non-null      object
dtypes: object(1)
memory usage: 64.0+ bytes

df2 result

Can anyone help me understand what is going on here, and how to properly read in this json file? I am not really familiar with json and most examples I've seen online are for very standardized json formats.

Thank you!

5
  • How did you create the dataframe? By reading values from CSV file? Maybe you forgot to specify ; as separator. Commented Jul 29, 2021 at 21:30
  • @AndrejKesely The default behavior of read_json is to create a dataframe. Otherwise I'm not reading anything from a CSV... Commented Jul 29, 2021 at 21:36
  • But how this Json was created? Commented Jul 29, 2021 at 21:38
  • I have no idea, @AndrejKesely -- the json file was given to me as is. Commented Jul 29, 2021 at 21:39
  • 1
    I don't think "Col1;Col2;Col3;Col4;Col5" represents five different columns, since everything is inside double quotes, they'll be treated as a single value for JSON. Commented Jul 29, 2021 at 21:53

1 Answer 1

1
  • you have JSON as the overall structure
  • within the JSON keys and values you have semi-colon delimited pairs
  • this can easily be fully decoded by
    1. initailise a data frame with pd.DataFrame() with the JSON
    2. expand the delimited keys and values using split(";")
    3. convert these lists into pd.Series to then have a dataframe with columns and values
d = {"Col1;Col2;Col3;Col4;Col5":{"0":"value;value;value;value;value","1":"value;value;value;value;value","2":"value;value;value;value;value","N":"value;value;value;value;value"}}
df = pd.DataFrame(d)

df2 = df.iloc[:,0].apply(lambda s: pd.Series(s.split(";"), index=df.columns[0].split(";")))

df2
Col1 Col2 Col3 Col4 Col5
0 value value value value value
1 value value value value value
2 value value value value value
N value value value value value
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.