0

I am trying to clean a csv file for data analysis. How do I convert TRUE FALSE into 1 and 0?

When I search Google, they suggested df.somecolumn=df.somecolumn.astype(int). However this csv file has 100 columns and not every column is true false(some are categorical, some are numerical). How do I do a sweeping code that allows us to convert any column with TRUE FALSE to 1 and 0 without typing 50 lines of df.somecolumn=df.somecolumn.astype(int)

3
  • Perhaps a different question - why do you need to cast a Boolean to an Int? In nearly all use cases, the two can be achieve equivalent results. If you want them as integers so you can count the number of trues, then Pandas can already do this with Booleans. Commented Oct 16, 2019 at 16:38
  • Also possibly a duplicate of stackoverflow.com/questions/33601010/… Commented Oct 16, 2019 at 16:40
  • I don't this is a dup. The question is very different. Though df*1 might work as it does not affect string and numeric values. Commented Oct 16, 2019 at 19:40

3 Answers 3

4

you can use:

df.select_dtypes(include='bool')=df.select_dtypes(include='bool').astype(int)

Sign up to request clarification or add additional context in comments.

1 Comment

Could be rewritten as df.update(df.select_dtypes('bool').astype(int))...
0

A slightly different approach. First, dtypes of a dataframe can be returned using df.dtypes, which gives a pandas series that looks like this,

a     int64
b      bool
c    object
dtype: object

Second, we could replace bool with int type using replace,

df.dtypes.replace('bool', 'int8'), this gives

a     int64
b     int8
c    object
dtype: object

Finally, pandas seires is essentially a dictionary which can be passed to pd.DataFrame.astype.

We could write it as a oneliner,

df.astype(df.dtypes.replace('bool', 'int8'))

Comments

0

I would do it like this:

df.somecolumn = df.somecolumn.apply(lambda x: 1 if x=="TRUE" else 0)

If you want to iterate through all your columns and check wether they have TRUE/FALSE values, you can do this:

for c in df:
    if 'TRUE' in df[c] or 'FALSE' in df[c]:
        df[c] = df[c].apply(lambda x: 1 if x=='TRUE' else 0)

Note that this approach is case-sensitive and won't work well if in the column the TRUE/FALSE values are mixed with others.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.