Creating dataframe column with text strings from dummy columns

Question

I have a dataframe where I've used get_dummies to create several columns (df.earth, df.wind, df.water, df.fire, df.heart - for example) and groupby to aggregate rows, so now a row can have multiple dummy columns with 1. The dataframe now looks like this:

ID Earth Wind Water Fire Heart
12 0       1        1        0      1
13 1       0        0        0      0
14 1       0        1        0      0

I need to create a column that checks each dummy column and writes the column names that apply for each row that would look like this:

ID Earth Wind Water Fire Heart Powers
12 0        1        1        0        1        Wind, Water, Heart
13 1        0        1        0       0        Earth, Water
14 1        0       1         0        0        Earth, Water, Heart

I'm not really sure where to start, and my searching hasn't gotten me very far.

Please review minimal reproducible example

user3483203
– user3483203

2019-04-05 18:08:15 +00:00
Commented Apr 5, 2019 at 18:08 — user3483203
– user3483203, Commented Apr 5, 2019 at 18:08
You can start by showing us some of the data.

Edeki Okoh
– Edeki Okoh

2019-04-05 18:43:51 +00:00
Commented Apr 5, 2019 at 18:43 — Edeki Okoh
– Edeki Okoh, Commented Apr 5, 2019 at 18:43
@EdekiOkoh I added some sample data. Thanks!

nostradukemas
– nostradukemas

2019-04-05 19:10:31 +00:00
Commented Apr 5, 2019 at 19:10 — nostradukemas
– nostradukemas, Commented Apr 5, 2019 at 19:10

rafaelc · Accepted Answer · 2019-04-05 19:15:40Z

2

Use

df['Powers'] = df.apply(lambda s: ', '.join(s.index[s.eq(1)]), axis=1)

answered Apr 5, 2019 at 19:15

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ben Pap · Accepted Answer · 2019-04-05 19:12:21Z

2

df = pd.DataFrame(
            { 'A': [0, 0, 0],
            'B': [1, 0, 0],
            'C': [0, 1, 0],
            'D': [0, 0, 0],
            'E': [1, 0, 1],
            'F': [0, 0, 1],
            }
                )

df

    A   B   C   D   E   F
0   0   1   0   0   1   0
1   0   0   1   0   0   0
2   0   0   0   0   1   1

Your probably looking at a df like the one above. You can do the following to pull the columns that contain 1.

columns = []
for col in df.T:
    columns.append(df.T[df.T[col] == 1].index.tolist())

has1 = pd.DataFrame(columns).apply(lambda x: ', '.join(x[x.notnull()]), axis = 1)
df['Is1'] = has1

df

    A   B   C   D   E   F   Is1
0   0   1   0   0   1   0   B, E
1   0   0   1   0   0   0   C
2   0   0   0   0   1   1   E, F

answered Apr 5, 2019 at 19:12

Ben Pap

2,5791 gold badge10 silver badges17 bronze badges

3 Comments

nostradukemas Over a year ago

Is there a way to get this to only check certain columns? So if I wanted it to skip the first four columns and only write in E/F?

Ben Pap Over a year ago

Change columns.append(df.T[df.T[col] == 1].index.tolist()) line to columns.append(df[['E','F']].T[df[['E','F']].T[col] == 1].index.tolist())

Ben Pap Over a year ago

You can also change RafaelC code to df['Powers'] = df[['E','F']].apply(lambda s: ', '.join(s.index[s.eq(1)]), axis=1)

Collectives™ on Stack Overflow

Creating dataframe column with text strings from dummy columns

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related