1

Need to create a new column that concatenate multiple columns based on the value of each column. For example, input:

s1  s2  s3
1   0   0
1   1   0
0   1   2

Output:

s1  s2  s3  col
1   0   0   s1
1   1   0   s1, s2
0   1   2   s2, s3

Basically I need to output a column name when the value is >0, and only output it once even if the number is more than 1.

The code I used but didn't work:

df['col'] = 's1' * min(df['s1'], 1) + ', ' + 's2' * min(df['s2'], 1) + ', ' + 's3' * min(df['s3'], 1)

2 Answers 2

5

Create mask for compare values greater like 0 by DataFrame.gt and then use DataFrame.dot with columns names for matrix multiplication:

cols = ["s1", "s2", "s3"]
df['col'] = df[cols].gt(0).dot(df.columns + ',').str[:-1]
print (df)
   s1  s2  s3    col
0   1   0   0     s1
1   1   1   0  s1,s2
2   0   1   2  s2,s3
Sign up to request clarification or add additional context in comments.

1 Comment

Elegant answer!
0

It can be done with a df.apply and a lambda function as follows

df["col"] = df.apply(lambda row: ",".join(row[row>0].keys().tolist()), axis=1)

you can also pass specific column names when you have to apply this operation to only a specific columns

df["col2"] = df[["s1", "s2", "s3"]].apply(lambda row: ",".join(row[row>0].keys().tolist()), axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.