Pandas create new column with array of other columns with a non null value

Question

I have a dataframe like the following:

   List1  List2  List3  List4  List5  List6  List7  List8
0    NaN    NaN      1    NaN    NaN    NaN      1    NaN
1    NaN    NaN      1    NaN    NaN    NaN      1    NaN
2    NaN    NaN      1    NaN    1.0    NaN      1    NaN
3    NaN    NaN      1    NaN    NaN    NaN      1    NaN
4    NaN    NaN      1    NaN    1.0    NaN      1    NaN

I want to create a new Column called Lists which is an array of all the other columns with a non null value. ie:

                           Lists
0    ['List3', 'List7']
1    ['List3', 'List7']
2    ['List3', 'List5', 'List7']
3    ['List3', 'List7']
4    ['List3', 'List5', 'List7']

I accomplished this with an iterrows() loop, but it's not performant at all. Would appreciate any ideas here.

Erfan · Accepted Answer · 2021-08-03 19:34:56Z

1

We can use DataFrame.dot to get all columns which are notna:

df["Lists"] = df.notna().dot(df.columns+",").str.rstrip(",").str.split(",")

   List1  List2  List3  List4  List5  List6  List7  List8                  Lists
0    NaN    NaN      1    NaN    NaN    NaN      1    NaN         [List3, List7]
1    NaN    NaN      1    NaN    NaN    NaN      1    NaN         [List3, List7]
2    NaN    NaN      1    NaN   1.00    NaN      1    NaN  [List3, List5, List7]
3    NaN    NaN      1    NaN    NaN    NaN      1    NaN         [List3, List7]
4    NaN    NaN      1    NaN   1.00    NaN      1    NaN  [List3, List5, List7]

answered Aug 3, 2021 at 19:34

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Hassan Syyid Over a year ago

This worked perfectly with way better performance! Thanks

Andrej Kesely · Accepted Answer · 2021-08-03 19:35:17Z

1

Another version:

df["Lists"] = df.apply(lambda x: x[x.notna()].index.tolist(), axis=1)
print(df)

Prints:

   List1  List2  List3  List4  List5  List6  List7  List8                  Lists
0    NaN    NaN      1    NaN    NaN    NaN      1    NaN         [List3, List7]
1    NaN    NaN      1    NaN    NaN    NaN      1    NaN         [List3, List7]
2    NaN    NaN      1    NaN    1.0    NaN      1    NaN  [List3, List5, List7]
3    NaN    NaN      1    NaN    NaN    NaN      1    NaN         [List3, List7]
4    NaN    NaN      1    NaN    1.0    NaN      1    NaN  [List3, List5, List7]

answered Aug 3, 2021 at 19:35

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

1 Comment

Hassan Syyid Over a year ago

What's the performance on this (and in general using apply) vs the accepted answer?

ThePyGuy · Accepted Answer · 2021-08-03 19:32:01Z

0

You can use pandas.DataFrame.apply on axis=1 with a lambda with a list-comprehension and take only the columns that has non-NaN value for each row using pd.notna():

>>> df.apply(lambda x: [c for c in df if pd.notna(x[c])], axis=1)

0           [List3, List7]
1           [List3, List7]
2    [List3, List5, List7]
3           [List3, List7]
4    [List3, List5, List7]
dtype: object

answered Aug 3, 2021 at 19:32

ThePyGuy

18.5k5 gold badges24 silver badges55 bronze badges

Collectives™ on Stack Overflow

Pandas create new column with array of other columns with a non null value

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related