Data Cleaning with Pandas in Python

Question

I am trying to clean a csv file for data analysis. How do I convert TRUE FALSE into 1 and 0?

When I search Google, they suggested df.somecolumn=df.somecolumn.astype(int). However this csv file has 100 columns and not every column is true false(some are categorical, some are numerical). How do I do a sweeping code that allows us to convert any column with TRUE FALSE to 1 and 0 without typing 50 lines of df.somecolumn=df.somecolumn.astype(int)

Perhaps a different question - why do you need to cast a Boolean to an Int? In nearly all use cases, the two can be achieve equivalent results. If you want them as integers so you can count the number of trues, then Pandas can already do this with Booleans. — tyleha
– tyleha, Commented Oct 16, 2019 at 16:38
Also possibly a duplicate of stackoverflow.com/questions/33601010/… — tyleha
– tyleha, Commented Oct 16, 2019 at 16:40
I don't this is a dup. The question is very different. Though df*1 might work as it does not affect string and numeric values. — Mark Wang
– Mark Wang, Commented Oct 16, 2019 at 19:40

Benoit de Menthière · Accepted Answer · 2019-10-16 17:02:38Z

4

you can use:

df.select_dtypes(include='bool')=df.select_dtypes(include='bool').astype(int)

answered Oct 16, 2019 at 17:02

Benoit de Menthière

7334 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jon Clements Over a year ago

Could be rewritten as df.update(df.select_dtypes('bool').astype(int))...

Mark Wang · Accepted Answer · 2019-10-16 19:43:01Z

0

A slightly different approach. First, dtypes of a dataframe can be returned using df.dtypes, which gives a pandas series that looks like this,

a     int64
b      bool
c    object
dtype: object

Second, we could replace bool with int type using replace,

df.dtypes.replace('bool', 'int8'), this gives

a     int64
b     int8
c    object
dtype: object

Finally, pandas seires is essentially a dictionary which can be passed to pd.DataFrame.astype.

We could write it as a oneliner,

df.astype(df.dtypes.replace('bool', 'int8'))

edited Oct 16, 2019 at 19:43

answered Oct 16, 2019 at 19:30

Mark Wang

2,7579 silver badges18 bronze badges

Comments

user1695639 · Accepted Answer · 2019-10-16 19:59:34Z

0

I would do it like this:

df.somecolumn = df.somecolumn.apply(lambda x: 1 if x=="TRUE" else 0)

If you want to iterate through all your columns and check wether they have TRUE/FALSE values, you can do this:

for c in df:
    if 'TRUE' in df[c] or 'FALSE' in df[c]:
        df[c] = df[c].apply(lambda x: 1 if x=='TRUE' else 0)

Note that this approach is case-sensitive and won't work well if in the column the TRUE/FALSE values are mixed with others.

answered Oct 16, 2019 at 19:59

user1695639

711 silver badge4 bronze badges

Collectives™ on Stack Overflow

Data Cleaning with Pandas in Python

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related