0

I want to replace string boolean type present inside a column with actual boolean values.

kdf = pd.DataFrame(data={'col1' : [True, 'True', np.nan], 'dt': [datetime.now(), ' 2018-12-12', '2019-12-12'], 'bool': 
                     [False, True, True], 'bnan': [False, True, np.nan]})

so here, I want to convert True(index 1 on col1) to actual boolean type True. What I did was,

kdf.loc[kdf['col1'].str.contains('true', na=False, case=False)] = True
kdf.loc[kdf['col1'].str.contains('false', na=False, case=False)] = False

which converts the column values to actual type but I'm in need of creating a function which accepts only the df column, do an in-line replace and return the modified column (like col.fillna). Note that we are not allowed to pass the whole df into that func. So I can't use df.loc.

Also I'm bit worry about performance, is there anyother way?

3 Answers 3

1

Why not using replace

df.replace({'True':True,'False':False})
# df.replace({'True':True,'False':False}).applymap(type)
Out[123]: 
              bnan            bool             col1             dt
0   <class 'bool'>  <class 'bool'>   <class 'bool'>  <class 'str'>
1   <class 'bool'>  <class 'bool'>   <class 'bool'>  <class 'str'>
2  <class 'float'>  <class 'bool'>  <class 'float'>  <class 'str'>

Update

df.replace({'True':True,'False':False},regex=True).applymap(type)

Sample data notice I added the leading and trailing space

df = pd.DataFrame(data={'col1' : [True, ' True', np.nan], 'dt': [' 2018-12-12', ' 2018-12-12', '2019-12-12'], 'bool': 
                     [False, True, True], 'bnan': ['False  ', True, np.nan]})
Sign up to request clarification or add additional context in comments.

1 Comment

but the column having leading or trailing spaces, also I want to apply the transformation only a particular col
1
df['col'] = df['col'].apply(lambda x: x.strip().lower() == 'true')

I think the above should work.

Hope this helps!

2 Comments

but the column value may contain leading or trailing spaces
You could improve the condition looking for a substring or evaluating a regex in the condition. But the process of passing the column to a boolean column is the same.
0

Expanding on @89f3a1c's solution and @AvinashRaj's Comment:

We introduce the following data problems in the data.
1. The string 'True' is changed to ' true '. This introduces case-mismatch and leading and trailing spaces.

import pandas as pd
from datetime import datetime

kdf = pd.DataFrame(data={'col1' : [True, ' true  ', np.nan], 
                         'dt': [datetime.now(), ' 2018-12-12', '2019-12-12'], 
                         'bool': [False, True, True], 
                         'bnan': [False, True, np.nan]})

kdf['col1'] = kdf['col1'].apply(lambda x: True if str(x).strip() in ['true','True'] else False)

Dataframe:

    col1    dt  bool    bnan
0   True    2019-09-19 03:22:06.734861  False   False
1   true    2018-12-12 00:00:00.000000  True    True
2   NaN 2019-12-12 00:00:00.000000  True    NaN

Output:

    col1    dt  bool    bnan
0   True    2019-09-19 03:26:47.611914  False   False
1   True    2018-12-12 00:00:00.000000  True    True
2   False   2019-12-12 00:00:00.000000  True    NaN

5 Comments

seems like it's affecting Boolean values as well, isn't it? I mean, it does convert the actual boolean value True to str and then check for that val in ['true', 'True']
I thought you wanted to have all True and False values as boolean and remove any NANs. Am I missing something?
I think this would be a faster solution than checking if x is not bool then check if the str(x) is in ['true', 'True'].
can we just check whether the x is str or not, if yes do replace
Well you can. But I though you have a restriction of not passing the entire dataframe (and may be the entire column as well?).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.