3

I need to convert a pandas dataframe to a JSON object.

However

json.dumps(df.to_dict(orient='records'))

fails as the boolean columns are not JSON serializable since they are of type numpy.bool_. Now I've tried df['boolCol'] = df['boolCol'].astype(bool) but that still leaves the type of the fields as numpy.bool_ rather than the pyhton bool which serializes to JSON no problem.

Any suggestions on how to convert the columns without looping through every record and converting it?

Thanks

EDIT:

This is part of a whole sanitization of dataframes of varying content so they can be used as the JSON payload for an API. Hence we currently have something like this:

for cols in df.columns:
    if type(df[cols][0]) == pd._libs.tslibs.timestamps.Timestamp:
        df[cols] = df[cols].astype(str)
    elif type(df[cols]) == numpy.bool_:
        df[cols] = df[cols].astype(bool) #still numnpy bool afterwards!
4
  • a shot in the dark, but are you sure that df is a pandas data frame and not a numpy array? Commented Feb 13, 2019 at 13:31
  • >>> type(df) <class 'pandas.core.frame.DataFrame'> yep :) Commented Feb 13, 2019 at 13:33
  • so far json.loads(df.to_json(orient='records')) will work but seems like a poor solution Commented Feb 13, 2019 at 13:45
  • Don't you consider using Python 3? In Python 3, the type of that field will be <class 'bool'> and json.dumps can be executed successfully. Commented Feb 13, 2019 at 14:08

2 Answers 2

1

Just tested it out, and the problem seems to be caused by the orient='records' parameter. Seems you have to set it to a option (e.g. list) and convert the results to your preferred format.

import numpy as np
import pandas as pd
column_name = 'bool_col'

bool_df = pd.DataFrame(np.array([True, False, True]), columns=[column_name])

list_repres = bool_df.to_dict('list')
record_repres = [{column_name: values} for values in list_repres[column_name]]

json.dumps(record_repres)
Sign up to request clarification or add additional context in comments.

1 Comment

sorry should have mentioned this is not the only column in the dataframe. There are a number of columns and I just need the various columns with non-serializable datatypes. something along the lines of looping through the df columns, and change the type if an entry in that column is not serializable. I'll update the question above.
1

You need to use .astype and set its field dtype to object

See example below:

df = pd.DataFrame({
   "time": ['0hr', '128hr', '72hr', '48hr', '96hr'],
   "value": [10, 20, 30, 40, None]
})
df['revoked'] = False
df.revoked = df.revoked.astype(bool)
print 'setting astype as bool:', type(df.iloc[0]['revoked'])


df.revoked = df.revoked.astype(object)
print 'setting astype as object:', type(df.iloc[0]['revoked'])


>>> setting astype as bool: <type 'numpy.bool_'>
>>> setting astype as object: <type 'bool'>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.