1

I do not still grok the right tool for what I need to do in pandas. It probably needs groupby(), but I was not able to locate the pythonic way (or any other) in the docs or on the web yet.

I have a table with data of a similar structure (30-50 columns):

ID   name  Town     s1       s2       s3       s4

21   Joe   Bonn     rd       fd       NaN      aa
21   Joe   Bonn     NaN      hg       kk       NaN
22   Ann   Oslo     jg       hg       zt       uz
29   Mya   Rome     rd       fd       NaN      aa

I would like to combine rows with the same ID (that would be the index), combining the values in the rows without duplication, forming kind of a union of the string values.

So the result would be:

21   Joe   Bonn     rd       fd,hg    kk       aa
22   Ann   Oslo     jg       hg       zt       uz
29   Mya   Rome     rd       fd       NaN      aa

df.groupby(df.index).sum() was a guess, but it just gives one NaN next to each index.

1 Answer 1

3

Could try something as this, you need to drop missing values before using the join function:

df.groupby(["ID", "name", "Town"], as_index=False).agg(lambda col: ','.join(col.dropna()))

#   ID  name    Town    s1     s2    s3    s4
#0  21   Joe    Bonn    rd  fd,hg    kk    aa
#1  22   Ann    Oslo    jg     hg    zt    uz
#2  29   Mya    Rome    rd     fd          aa
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot! At last, this is giving results almost as I intended. I only needed to massage the lambda to avoid duplication: lambda col: ','.join(numpy.unique(col.dropna()))
Just as a side note, if you want to remove duplicates, you can also use the drop_duplicates() without invoking numpy explicitly. lambda col: ','.join(col.dropna().drop_duplicates())

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.