pandas convert strings column to boolean

Question

I am trying to convert a column containing True/False and null values in string format to Boolean. But whatever I do I end up with either all True values or False Below is my approach to

consider following dataFrame

df = pd.DataFrame({'w':['True', np.nan, 'False'
                        'True', np.nan, 'False']})

df['w'].dtypes
Out: dtype('O')

df['w'].unique()
Out: array([True, nan, False], dtype=object)

d = {'nan': np.nan,'False':False, 'True': True}
df['w']=df['w'].map(d)

df['w'].dtypes
Out: dtype('O')

df['w'].unique()
array([nan], dtype=object)

One other approach I used is following this SO post:

d = {'nan': 0,'False':0, 'True': 1 }
df['w']=df['w'].map(d)
df['w']=df['w'].astype('bool')

Now it turns to bool but converts all values to True

df['w'].dtypes
Out: dtype('bool')

df['w'].unique()
Out: array([ True])

What am I doing wrong? I want all null values to be null

Can you share all relevant code and data? See: minimal reproducible example. — AMC
– AMC, Commented Nov 27, 2019 at 6:50
@AlexanderCécile what more can i share i have added code to generate my dataframe as well — tsu90280
– tsu90280, Commented Nov 27, 2019 at 6:56
That first approach (the map) looks correct to me, no?What’s wrong with it? — AMC
– AMC, Commented Nov 27, 2019 at 7:05
@AlexanderCécile it worked well, I was a little confused about nan being in their which resulted in dtypes out to be object, @jezrael ans/explanation helped clear the issue — tsu90280
– tsu90280, Commented Nov 27, 2019 at 7:45

jezrael · Accepted Answer · 2019-11-27 06:55:34Z

2

I think not necessary, because your original data contains boolean with nans, dtypes is object because mixed values - boolean with missing values:

df = pd.DataFrame({'w':['True', np.nan, 'False']})

print (df['w'].unique())
['True' nan 'False']

print ([type(x) for x in df['w'].unique()])
[<class 'str'>, <class 'float'>, <class 'str'>]

If also nan is string then your solution working:

df = pd.DataFrame({'w':['True', 'nan', 'False']})

print ([type(x) for x in df['w'].unique()])
[<class 'str'>, <class 'str'>, <class 'str'>]

d = {'nan': np.nan,'False':False, 'True': True}
df['w'] = df['w'].map(d)

print (df['w'].unique())
[True nan False]

print ([type(x) for x in df['w'].unique()])
[<class 'bool'>, <class 'float'>, <class 'bool'>]

df = pd.DataFrame({'w':[True, np.nan, False]})

print (df['w'].unique())
[True nan False]

print ([type(x) for x in df['w'].unique()])
[<class 'bool'>, <class 'float'>, <class 'bool'>]

If want replace nan to False use Series.fillna:

df['w'] = df['w'].fillna(False)
print (df)
       w
0   True
1  False
2  False

print (df['w'].dtypes)
bool

print (df['w'].unique())
[ True False]

edited Nov 27, 2019 at 6:55

answered Nov 27, 2019 at 6:40

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

tsu90280 Over a year ago

if i want nan to be false as well then what should be the process?

jezrael Over a year ago

Then use df['w'] = df['w'].fillna(False)

Collectives™ on Stack Overflow

pandas convert strings column to boolean

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related