3

I have been trying to convert a dataframe to JSON using Python. I am able to do it successfully but i am not getting the required format of JSON.

Code -

df1 = df.rename_axis('CUST_ID').reset_index()
df.to_json('abc.json')

Here, abc.json is the filename of JSON and df is the required dataframe.

What I am getting -

{"CUST_LAST_UPDATED": 
{"1000":1556879045879.0,"1001":1556879052416.0},
"CUST_NAME":{"1000":"newly 
updated_3_file","1001":"heeloo1"}}

What I want -

[{"CUST_ID":1000,"CUST_NAME":"newly 
updated_3_file","CUST_LAST_UPDATED":1556879045879},
{"CUST_ID":1001,"CUST_NAME":"heeloo1","CUST_LAST_UPDATED":1556879052416}]

Error -

Traceback (most recent call last):
File 
"C:/Users/T/PycharmProject/test_pandas.py", 
line 19, in <module>
df1 = df.rename_axis('CUST_ID').reset_index()
File "C:\Users\T\AppData\Local\Programs\Python\Python36\lib\site- 
packages\pandas\core\frame.py", line 3379, in reset_index
new_obj.insert(0, name, level_values)
File "C:\Users\T\AppData\Local\Programs\Python\Python36\lib\site- 
packages\pandas\core\frame.py", line 2613, in insert
allow_duplicates=allow_duplicates)
File "C:\Users\T\AppData\Local\Programs\Python\Python36\lib\site- 
packages\pandas\core\internals.py", line 4063, in insert
raise ValueError('cannot insert {}, already exists'.format(item))
ValueError: cannot insert CUST_ID, already exists

df.head() Output -

    CUST_ID  CUST_LAST_UPDATED              CUST_NAME
0     1000      1556879045879     newly updated_3_file
1     1001      1556879052416                  heeloo1

How to change the format while converting dataframe to JSON?

3 Answers 3

3

Use DataFrame.rename_axis with DataFrame.reset_index for column from index and then DataFrame.to_json with orient='records':

df1 = df.rename_axis('CUST_ID').reset_index()
df1.to_json('abc.json', orient='records')

[{"CUST_ID":"1000",
  "CUST_LAST_UPDATED":1556879045879.0,
  "CUST_NAME":"newly updated_3_file"},
 {"CUST_ID":"1001",
  "CUST_LAST_UPDATED":1556879052416.0,
  "CUST_NAME":"heeloo1"}]

EDIT:

Because there is default index in data, use:

df1.to_json('abc.json', orient='records')

Verify:

print (df1.to_json(orient='records'))
[{"CUST_ID":1000,
  "CUST_LAST_UPDATED":1556879045879,
  "CUST_NAME":"newly pdated_3_file"},
 {"CUST_ID":1001,
  "CUST_LAST_UPDATED":1556879052416,
  "CUST_NAME":"heeloo1"}]
Sign up to request clarification or add additional context in comments.

8 Comments

It is giving an error. Please Check post for the error
I cannot rename the column, neither can I drop it and recreate because the value in the CUST_ID column will change. But if I try just the first approach, it is still giving the error.
No, I do not want any duplicated column. I want only three columns. Like the one I mentioned in the question.
df.head() returns top n (5 by default) rows of a data frame.
CUST_ID CUST_LAST_UPDATED CUST_NAME 0 1000 1556879045879 newly updated_3_file 1 1001 1556879052416 heeloo1
|
0

You can convert a dataframe to a jason format using to_dict:

df1.to_dict('records')

the outpit would the one that you need.

Comments

0

Suppose if dataframe has nan values in each row and you don't want them in your json file. Follow below code

import pandas as pd
from pprint import pprint
import json
import argparse



if __name__=="__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--csv")
    parser.add_argument("--json")
    args = parser.parse_args()


    entities=pd.read_csv(args.csv)

    json_data=[row.dropna().to_dict() for index,row in entities.iterrows()]
    with open(args.json,"w") as file:
        json.dump(json_data,file)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.