How to create new column of lists in Pandas dataframe using existing column names and values?

Question

I'm currently using R for data science and I'm learning Python and Pandas to expand my toolkit. I'd like to create a new column of lists within a Pandas dataframe using existing column names and values.

For the following Pandas dataframe:

  test1  test2  test3
1      0      1      1
2      0      1      0
3      1      1      1
4      1      0      0
5      0      0      0

A new column would contain a list for each row that takes the column name wherever there's a '1' value, strips the 'test' prefix, and concatenates the list using a '-' separator.

   test1  test2  test3  combo
0      0      1      1    2-3
1      0      1      0      2
2      1      1      1  1-2-3
3      1      0      0      1
4      0      0      0

I can create the column in R and data.table using the following code:

df [, combo := apply (df == 1, 1, function(x) {
   paste (gsub("test", "", names(which(x))), collapse = "-")
}
)]

This is the closest I've come in Pandas:

def test(x):
    paste(loc[x])

df['combo'] = df.apply(test, df == 1, axis = 1)

TypeError: apply() got multiple values for argument 'axis'

Am I on the right path?

Haleemur Ali · Accepted Answer · 2018-03-05 19:12:52Z

1

df['combo'] = df.apply(lambda x: '-'.join(list(x[x == 1].index)).replace('test', ''), axis=1)

produces the following output:

In [8]: df
Out[8]:
   test1  test2  test3  combo
0      0      1      1    2-3
1      0      1      0      2
2      1      1      1  1-2-3
3      1      0      0      1
4      0      0      0

The function lambda x: '-'.join(list(x[x == 1].index)).replace('test', '') picks the index of series elements that are equal to 1. The index for rows are the column names test1, test2, test3, so after joining the list, it is necessary to replace 'test' from the resulting string with ''.

And we need to apply this function along rows, hence we pass axis=1. The default axis=0 applies the function along columns.

answered Mar 5, 2018 at 19:12

Haleemur Ali

28.6k6 gold badges67 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Allen Qin · Accepted Answer · 2018-03-05 22:47:18Z

0

You can first rename column and then use apply to extract the column names and then join them.

df['combo'] = (
   df.rename(columns=lambda x: x.replace('test',''))
   .astype(bool)
   .apply(lambda x: '-'.join(x.loc[x].index), axis=1)
)

df
Out[15]: 
   test1  test2  test3  combo
1      0      1      1    2-3
2      0      1      0      2
3      1      1      1  1-2-3
4      1      0      0      1
5      0      0      0

answered Mar 5, 2018 at 22:47

Allen Qin

20k9 gold badges55 silver badges68 bronze badges

Collectives™ on Stack Overflow

How to create new column of lists in Pandas dataframe using existing column names and values?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related