2

I am trying to convert a column containing True/False and null values in string format to Boolean. But whatever I do I end up with either all True values or False Below is my approach to

consider following dataFrame

df = pd.DataFrame({'w':['True', np.nan, 'False'
                        'True', np.nan, 'False']})
df['w'].dtypes
Out: dtype('O')

df['w'].unique()
Out: array([True, nan, False], dtype=object)

d = {'nan': np.nan,'False':False, 'True': True}
df['w']=df['w'].map(d)

df['w'].dtypes
Out: dtype('O')

df['w'].unique()
array([nan], dtype=object)

One other approach I used is following this SO post:

d = {'nan': 0,'False':0, 'True': 1 }
df['w']=df['w'].map(d)
df['w']=df['w'].astype('bool')

Now it turns to bool but converts all values to True

df['w'].dtypes
Out: dtype('bool')

df['w'].unique()
Out: array([ True])

What am I doing wrong? I want all null values to be null

4
  • Can you share all relevant code and data? See: minimal reproducible example. Commented Nov 27, 2019 at 6:50
  • @AlexanderCécile what more can i share i have added code to generate my dataframe as well Commented Nov 27, 2019 at 6:56
  • That first approach (the map) looks correct to me, no?What’s wrong with it? Commented Nov 27, 2019 at 7:05
  • @AlexanderCécile it worked well, I was a little confused about nan being in their which resulted in dtypes out to be object, @jezrael ans/explanation helped clear the issue Commented Nov 27, 2019 at 7:45

1 Answer 1

2

I think not necessary, because your original data contains boolean with nans, dtypes is object because mixed values - boolean with missing values:

df = pd.DataFrame({'w':['True', np.nan, 'False']})

print (df['w'].unique())
['True' nan 'False']

print ([type(x) for x in df['w'].unique()])
[<class 'str'>, <class 'float'>, <class 'str'>]

If also nan is string then your solution working:

df = pd.DataFrame({'w':['True', 'nan', 'False']})

print ([type(x) for x in df['w'].unique()])
[<class 'str'>, <class 'str'>, <class 'str'>]

d = {'nan': np.nan,'False':False, 'True': True}
df['w'] = df['w'].map(d)

print (df['w'].unique())
[True nan False]

print ([type(x) for x in df['w'].unique()])
[<class 'bool'>, <class 'float'>, <class 'bool'>]

df = pd.DataFrame({'w':[True, np.nan, False]})

print (df['w'].unique())
[True nan False]

print ([type(x) for x in df['w'].unique()])
[<class 'bool'>, <class 'float'>, <class 'bool'>]

If want replace nan to False use Series.fillna:

df['w'] = df['w'].fillna(False)
print (df)
       w
0   True
1  False
2  False

print (df['w'].dtypes)
bool

print (df['w'].unique())
[ True False]
Sign up to request clarification or add additional context in comments.

2 Comments

if i want nan to be false as well then what should be the process?
Then use df['w'] = df['w'].fillna(False)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.