How to replace different binary values across columns into 1/0

Question

I have a dataset contains multiple binary values.

df = pd.DataFrame({"a": ["y", "n"], "b": ["t", "f"], 
                   "c": ["known", "unknown"], "d": ['found', 'not found']})

I want to replace all the binary columns to be 1/0, while not affect other numeric columns. Are there any simple solutions using one or two lines? The dataset contains over 500 columns, which is difficult to check and replace them one by one. Thanks.

Welcome to SO. Please review How to Ask, and create a minimal reproducible example. That means no broken sample code for others to test. You current sample code is not valid python, so it will be difficult to help. — user3483203
– user3483203, Commented Jul 29, 2019 at 17:01
If these are just binary, and you don't particularly care which you pick one try: pd.get_dummies(df).iloc[:, ::2]. Otherwise please provide a more complete example and explanation of what you need. — ALollz
– ALollz, Commented Jul 29, 2019 at 17:09
OR df.assign(**df.select_dtypes(object).apply(lambda c: c.factorize()[0])) — piRSquared
– piRSquared, Commented Jul 29, 2019 at 17:10
But as for "the 500 other columns" we need a few more constraints. Is every object column guaranteed to be a binary column you need to transform? If not, I think you'll at least need some pattern or a list of the specific columns to transform. Or perhaps we can try with nunique == 2? — ALollz
– ALollz, Commented Jul 29, 2019 at 17:12

ALollz · Accepted Answer · 2019-07-29 17:39:51Z

1

Can use pd.get_dummies with drop_first=True credit to @piRSquared

pd.get_dummies(df, drop_first=True)

#   a_y  b_t  c_unknown  d_not found
#0    1    1          0            0
#1    0    0          1            1

If this needs to be done for only binary object columns subset first.

df = pd.DataFrame({'a': ['y', 'n', 'c'], 
                   'b': ['t', 'f', 't'], 
                   'c': ['known', 'unknown', 'known'],
                   'd': ['found', 'not found', 'found'],
                   'e': [1, 2, 2]})

pd.get_dummies(df.loc[:, df.agg('nunique') == 2].select_dtypes(include='object'), 
               drop_first=True)

#   b_t  c_unknown  d_not found
#0    1          0            0
#1    0          1            1
#2    1          0            0

If there are a small number of binary responses across columns, consider creating a dictionary and mapping the values:

d = {'y': 1, 'n': 0,
     't': 1, 'f': 0,
     'known': 1, 'unknown': 0,
     'found': 1, 'not found': 0}

s = (df.agg('nunique') == 2) & (df.dtypes == 'object')
for col in s[s].index:
    df[col] = df[col].map(d)

#   a  b  c  d  e
#0  y  1  1  1  1
#1  n  0  0  0  2
#2  c  1  1  1  2
#   |
#  `a` not mapped because trinary

edited Jul 29, 2019 at 17:39

answered Jul 29, 2019 at 17:19

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

S Hendricks Over a year ago

Thanks, but how can be sure that get_dummies assign value 1 to 'T', 'known', 'y', 'found', and 0 otherwise? and what if I don't want change column names?

ALollz Over a year ago

@SHendricks when the data are messy there's not really an easy one liner to deal with it. You're going to need to specify the mapping so that we know "known = 1" as opposed to the opposite. I think any natural language processing to determine that is probably absolute overkill for something like this, which you can hard-code with much less time investment. If all 500 columns have 500 different binary responses you're just going to have to bite the bullet and code it how you want.

Collectives™ on Stack Overflow

How to replace different binary values across columns into 1/0

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related