Combining values in certain rows columnwise (in pandas)

Question

I do not still grok the right tool for what I need to do in pandas. It probably needs groupby(), but I was not able to locate the pythonic way (or any other) in the docs or on the web yet.

I have a table with data of a similar structure (30-50 columns):

ID   name  Town     s1       s2       s3       s4

21   Joe   Bonn     rd       fd       NaN      aa
21   Joe   Bonn     NaN      hg       kk       NaN
22   Ann   Oslo     jg       hg       zt       uz
29   Mya   Rome     rd       fd       NaN      aa

I would like to combine rows with the same ID (that would be the index), combining the values in the rows without duplication, forming kind of a union of the string values.

So the result would be:

21   Joe   Bonn     rd       fd,hg    kk       aa
22   Ann   Oslo     jg       hg       zt       uz
29   Mya   Rome     rd       fd       NaN      aa

df.groupby(df.index).sum() was a guess, but it just gives one NaN next to each index.

akuiper · Accepted Answer · 2016-12-13 18:37:38Z

3

Could try something as this, you need to drop missing values before using the join function:

df.groupby(["ID", "name", "Town"], as_index=False).agg(lambda col: ','.join(col.dropna()))

#   ID  name    Town    s1     s2    s3    s4
#0  21   Joe    Bonn    rd  fd,hg    kk    aa
#1  22   Ann    Oslo    jg     hg    zt    uz
#2  29   Mya    Rome    rd     fd          aa

answered Dec 13, 2016 at 18:37

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Stapke Over a year ago

Thanks a lot! At last, this is giving results almost as I intended. I only needed to massage the lambda to avoid duplication: lambda col: ','.join(numpy.unique(col.dropna()))

akuiper Over a year ago

Just as a side note, if you want to remove duplicates, you can also use the drop_duplicates() without invoking numpy explicitly. lambda col: ','.join(col.dropna().drop_duplicates())

Collectives™ on Stack Overflow

Combining values in certain rows columnwise (in pandas)

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related