Python pandas column to replace string boolean values to actual boolean type

Question

I want to replace string boolean type present inside a column with actual boolean values.

kdf = pd.DataFrame(data={'col1' : [True, 'True', np.nan], 'dt': [datetime.now(), ' 2018-12-12', '2019-12-12'], 'bool': 
                     [False, True, True], 'bnan': [False, True, np.nan]})

so here, I want to convert True(index 1 on col1) to actual boolean type True. What I did was,

kdf.loc[kdf['col1'].str.contains('true', na=False, case=False)] = True
kdf.loc[kdf['col1'].str.contains('false', na=False, case=False)] = False

which converts the column values to actual type but I'm in need of creating a function which accepts only the df column, do an in-line replace and return the modified column (like col.fillna). Note that we are not allowed to pass the whole df into that func. So I can't use df.loc.

Also I'm bit worry about performance, is there anyother way?

BENY · Accepted Answer · 2019-09-19 03:21:43Z

1

Why not using replace

df.replace({'True':True,'False':False})
# df.replace({'True':True,'False':False}).applymap(type)
Out[123]: 
              bnan            bool             col1             dt
0   <class 'bool'>  <class 'bool'>   <class 'bool'>  <class 'str'>
1   <class 'bool'>  <class 'bool'>   <class 'bool'>  <class 'str'>
2  <class 'float'>  <class 'bool'>  <class 'float'>  <class 'str'>

Update

df.replace({'True':True,'False':False},regex=True).applymap(type)

Sample data notice I added the leading and trailing space

df = pd.DataFrame(data={'col1' : [True, ' True', np.nan], 'dt': [' 2018-12-12', ' 2018-12-12', '2019-12-12'], 'bool': 
                     [False, True, True], 'bnan': ['False  ', True, np.nan]})

edited Sep 19, 2019 at 3:21

answered Sep 19, 2019 at 3:09

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Avinash Raj Over a year ago

but the column having leading or trailing spaces, also I want to apply the transformation only a particular col

Ernest Han · Accepted Answer · 2021-06-12 04:30:47Z

1

df['col'] = df['col'].apply(lambda x: x.strip().lower() == 'true')

I think the above should work.

Hope this helps!

edited Jun 12, 2021 at 4:30

Ernest Han

4541 gold badge5 silver badges11 bronze badges

answered Sep 19, 2019 at 3:09

89f3a1c

1,4881 gold badge14 silver badges25 bronze badges

2 Comments

Avinash Raj Over a year ago

but the column value may contain leading or trailing spaces

89f3a1c Over a year ago

You could improve the condition looking for a substring or evaluating a regex in the condition. But the process of passing the column to a boolean column is the same.

CypherX · Accepted Answer · 2019-09-19 03:27:19Z

0

Expanding on @89f3a1c's solution and @AvinashRaj's Comment:

We introduce the following data problems in the data.
1. The string 'True' is changed to ' true '. This introduces case-mismatch and leading and trailing spaces.

import pandas as pd
from datetime import datetime

kdf = pd.DataFrame(data={'col1' : [True, ' true  ', np.nan], 
                         'dt': [datetime.now(), ' 2018-12-12', '2019-12-12'], 
                         'bool': [False, True, True], 
                         'bnan': [False, True, np.nan]})

kdf['col1'] = kdf['col1'].apply(lambda x: True if str(x).strip() in ['true','True'] else False)

Dataframe:

    col1    dt  bool    bnan
0   True    2019-09-19 03:22:06.734861  False   False
1   true    2018-12-12 00:00:00.000000  True    True
2   NaN 2019-12-12 00:00:00.000000  True    NaN

Output:

    col1    dt  bool    bnan
0   True    2019-09-19 03:26:47.611914  False   False
1   True    2018-12-12 00:00:00.000000  True    True
2   False   2019-12-12 00:00:00.000000  True    NaN

edited Sep 19, 2019 at 3:27

answered Sep 19, 2019 at 3:21

CypherX

7,4034 gold badges29 silver badges39 bronze badges

5 Comments

Avinash Raj Over a year ago

seems like it's affecting Boolean values as well, isn't it? I mean, it does convert the actual boolean value True to str and then check for that val in ['true', 'True']

CypherX Over a year ago

I thought you wanted to have all True and False values as boolean and remove any NANs. Am I missing something?

CypherX Over a year ago

I think this would be a faster solution than checking if x is not bool then check if the str(x) is in ['true', 'True'].

Avinash Raj Over a year ago

can we just check whether the x is str or not, if yes do replace

CypherX Over a year ago

Well you can. But I though you have a restriction of not passing the entire dataframe (and may be the entire column as well?).

Collectives™ on Stack Overflow

Python pandas column to replace string boolean values to actual boolean type

3 Answers 3

1 Comment

2 Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related