1

I had a DataFrame like below:

       column-a         column-b      column-c
0          Nan             A              B
1           A              Nan            C
2           Nan            Nan            C
3           A              B              C

I hope to create a new column-D to capture all non-NULL values from column A to C:

        column d
0        A,B
1        A,C
2        C
3        A,B,C

Thanks!

2 Answers 2

5

You need to change the 'Nan' to np.nan, then using stack with groupby join

df=df.replace('Nan',np.nan)
df.stack().groupby(level=0).agg(','.join)
Out[570]: 
0      A,B
1      A,C
2        C
3    A,B,C
dtype: object

#df['column-d']= df.stack().groupby(level=0).agg(','.join)
Sign up to request clarification or add additional context in comments.

2 Comments

Wen, thanks for solution. If A,B,C were replaced as the numerical numbers: 1,2,3, could I sum them as a total?
@JennyJingYu then just do df.sum(1)
2

After fixing the nans:

df = df.replace('Nan', np.nan)

collect all non-null values in each row in a list and join the list items.

df['column-d'] = df.apply(lambda x: ','.join(x[x.notnull()]), axis=1)
#0      A,B
#1      A,C
#2        C
#3    A,B,C

Surprisingly, this solution is somewhat faster than the stack/groupby solution by Wen, at least for the posted dataset.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.