0

I'm currently using R for data science and I'm learning Python and Pandas to expand my toolkit. I'd like to create a new column of lists within a Pandas dataframe using existing column names and values.

For the following Pandas dataframe:

  test1  test2  test3
1      0      1      1
2      0      1      0
3      1      1      1
4      1      0      0
5      0      0      0

A new column would contain a list for each row that takes the column name wherever there's a '1' value, strips the 'test' prefix, and concatenates the list using a '-' separator.

   test1  test2  test3  combo
0      0      1      1    2-3
1      0      1      0      2
2      1      1      1  1-2-3
3      1      0      0      1
4      0      0      0  

I can create the column in R and data.table using the following code:

df [, combo := apply (df == 1, 1, function(x) {
   paste (gsub("test", "", names(which(x))), collapse = "-")
}
)]

This is the closest I've come in Pandas:

def test(x):
    paste(loc[x])

df['combo'] = df.apply(test, df == 1, axis = 1)

TypeError: apply() got multiple values for argument 'axis'

Am I on the right path?

2 Answers 2

1
df['combo'] = df.apply(lambda x: '-'.join(list(x[x == 1].index)).replace('test', ''), axis=1)

produces the following output:

In [8]: df
Out[8]:
   test1  test2  test3  combo
0      0      1      1    2-3
1      0      1      0      2
2      1      1      1  1-2-3
3      1      0      0      1
4      0      0      0

The function lambda x: '-'.join(list(x[x == 1].index)).replace('test', '') picks the index of series elements that are equal to 1. The index for rows are the column names test1, test2, test3, so after joining the list, it is necessary to replace 'test' from the resulting string with ''.

And we need to apply this function along rows, hence we pass axis=1. The default axis=0 applies the function along columns.

Sign up to request clarification or add additional context in comments.

Comments

0

You can first rename column and then use apply to extract the column names and then join them.

df['combo'] = (
   df.rename(columns=lambda x: x.replace('test',''))
   .astype(bool)
   .apply(lambda x: '-'.join(x.loc[x].index), axis=1)
)

df
Out[15]: 
   test1  test2  test3  combo
1      0      1      1    2-3
2      0      1      0      2
3      1      1      1  1-2-3
4      1      0      0      1
5      0      0      0       

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.