0

Now I have a dataframe. I want to separate the different values with commons and remove any nulls.

import pandas as pd
import numpy as np

s1 = pd.Series(['a', np.nan,'i'])
s2 = pd.Series(['a','f',np.nan])
s3 = pd.Series(['a', 'e','i'])
s4 = pd.Series(['c', 'g','j'])
    
df = pd.DataFrame([list(s1), list(s2), list(s3),list(s4)],  columns =  ['A', 'B','C'])
df


    A   B   C
0   a   d   NaN
1   a   f   NaN
2   a   e   i
3   c   g   j

Desired outcome:

    A   B       C
0   a   d,e,f   i
1   c   g       j

1 Answer 1

2

Try with

out = df.groupby('A',as_index=False).agg({'B':','.join,'C':'first'})
   A      B  C
0  a  d,f,e  i
1  c      g  j

Update

out = df.groupby('A',as_index=False).agg({'B':lambda x : ','.join(x.dropna().drop_duplicates()),'C':lambda x : ','.join(x.dropna().drop_duplicates())})
out
   A      B  C
0  a  d,f,e  i
1  c      g  j
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks! I test it. If the cell B0 in the original dataframe is NaN, the code seems not working.
if C0 is also 'i' instead of NaN. The result will be 'i,i', not merging.
Yes, I have checked the updated version. It is not metering, when if C0 is also 'i' instead of NaN.
That's right I got the same results, Now the cell in C0 is in the result is 'i,i', but the desired output is just 'i'
I just changed my original dataframe in the question. The desired outcome still the same. I am sorry for the confusion. Really appreciated your help!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.