1

I have pandas DataFrame with some numerical and some categorical (str) values, let's say this:

   A  B     C  D
0  x  y     a  2
1  x  x    aa  1
2  y  z    aa  4
3  y  z    aa  4
4  x  y  aaaa  0

I want to convert all the categorical value into boolean indicators. Because some of the columns can have the same value names, I want to create names for categorical values to be distinguished, for example columns_name + 'is + value_name.

The expected result is:

   D  A_is_x  A_is_y  B_is_y  B_is_x  B_is_z  C_is_a  C_is_aa  C_is_aaaa
0  2    True   False    True   False   False    True    False      False
1  1    True   False   False    True   False   False     True      False
2  4   False    True   False   False    True   False     True      False
3  4   False    True   False   False    True   False     True      False
4  0    True   False    True   False   False   False    False       True

I wrote some code that works, but it's not very pythonic.

    for col in data.columns:
    if not np.issubdtype(data[col].dtypes, np.number):
        values = data[col].unique()
        for value in values:
            data[col + '_is_' + value] = data[col].map(lambda x: x == value)
        data = data.drop(col, axis=1)

I try to write this using pd.get_dummies, but I have problems with convenient naming the new created columns. Is there any easier and cleaner solution than mine?

I know there were some related questions, but none of them resolve my problem with convenient naming the columns.

2 Answers 2

2

Use get_dummies with parameters prefix_sep='_is_' and dtype=bool, numeric column is not processing - is first in data like you need:

df = pd.get_dummies(df, prefix_sep='_is_', dtype=bool)

print (df)
   D  A_is_x  A_is_y  B_is_x  B_is_y  B_is_z  C_is_a  C_is_aa  C_is_aaaa
0  2    True   False   False    True   False    True    False      False
1  1    True   False    True   False   False   False     True      False
2  4   False    True   False   False    True   False     True      False
3  4   False    True   False   False    True   False     True      False
4  0    True   False   False    True   False   False    False       True
Sign up to request clarification or add additional context in comments.

Comments

2

Check get_dummies

df = df[['D']].join(pd.get_dummies(df[['A', 'B', 'C']], prefix_sep='_is_').astype(bool))
df
Out[390]: 
   D  A_is_x  A_is_y  B_is_x  B_is_y  B_is_z  C_is_a  C_is_aa  C_is_aaaa
0  2    True   False   False    True   False    True    False      False
1  1    True   False    True   False   False   False     True      False
2  4   False    True   False   False    True   False     True      False
3  4   False    True   False   False    True   False     True      False
4  0    True   False   False    True   False   False    False       True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.