2

I have a dataframe where I've used get_dummies to create several columns (df.earth, df.wind, df.water, df.fire, df.heart - for example) and groupby to aggregate rows, so now a row can have multiple dummy columns with 1. The dataframe now looks like this:

ID Earth Wind Water Fire Heart
12 0       1        1        0      1
13 1       0        0        0      0
14 1       0        1        0      0

I need to create a column that checks each dummy column and writes the column names that apply for each row that would look like this:

ID Earth Wind Water Fire Heart Powers
12 0        1        1        0        1        Wind, Water, Heart
13 1        0        1        0       0        Earth, Water
14 1        0       1         0        0        Earth, Water, Heart

I'm not really sure where to start, and my searching hasn't gotten me very far.

3
  • 1
    Please review minimal reproducible example Commented Apr 5, 2019 at 18:08
  • You can start by showing us some of the data. Commented Apr 5, 2019 at 18:43
  • 1
    @EdekiOkoh I added some sample data. Thanks! Commented Apr 5, 2019 at 19:10

2 Answers 2

2

Use

df['Powers'] = df.apply(lambda s: ', '.join(s.index[s.eq(1)]), axis=1)
Sign up to request clarification or add additional context in comments.

Comments

2
df = pd.DataFrame(
            { 'A': [0, 0, 0],
            'B': [1, 0, 0],
            'C': [0, 1, 0],
            'D': [0, 0, 0],
            'E': [1, 0, 1],
            'F': [0, 0, 1],
            }
                )

df

    A   B   C   D   E   F
0   0   1   0   0   1   0
1   0   0   1   0   0   0
2   0   0   0   0   1   1

Your probably looking at a df like the one above. You can do the following to pull the columns that contain 1.

columns = []
for col in df.T:
    columns.append(df.T[df.T[col] == 1].index.tolist())

has1 = pd.DataFrame(columns).apply(lambda x: ', '.join(x[x.notnull()]), axis = 1)
df['Is1'] = has1

df

    A   B   C   D   E   F   Is1
0   0   1   0   0   1   0   B, E
1   0   0   1   0   0   0   C
2   0   0   0   0   1   1   E, F

3 Comments

Is there a way to get this to only check certain columns? So if I wanted it to skip the first four columns and only write in E/F?
Change columns.append(df.T[df.T[col] == 1].index.tolist()) line to columns.append(df[['E','F']].T[df[['E','F']].T[col] == 1].index.tolist())
You can also change RafaelC code to df['Powers'] = df[['E','F']].apply(lambda s: ', '.join(s.index[s.eq(1)]), axis=1)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.