Combine Pandas Data Frame where different columns are missing

Question

I have a pandas data frame that looks like this:

asset, cusip, information1, information2, ...., information_n
1x4,   43942,    45,       ,  NaN,    ,    , NaN
1x4,   43942,    NaN,      ,  "hello",     , NaN
1x4,   43942,    NaN,      ,  NaN,     , "goodbye"
...

What I want is:

asset, cusip, information1, information2, ...., information_n
1x4,   43942,    45,       , "hello",    ,    , "goodbye"
...

Essentially I want to collapse down over matching "assets" and "cusips" regardless of the fields. There will be only one entry that's not NAN in information1...information_n.

Note that some columns might be int, some strings, others floats, etc.

Vaishali · Accepted Answer · 2017-11-06 20:49:05Z

3

You can use groupby and first() which gives you first and in your case only non-NaN value

df = df.groupby(['asset', 'cusip']).first().reset_index()


    asset   cusip   information1    information2    information_n
0   1x4     43942   45              "hello"         "goodbye"

answered Nov 6, 2017 at 20:49

Vaishali

38.5k5 gold badges62 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

piRSquared Over a year ago

Very nice answer! Equivalently, you can pass the as_index=False parameter and forgo the reset_index: df.groupby(['asset', 'cusip'], as_index=False).first()

Vaishali Over a year ago

@piRSquared, thank you! I need to remember as_index = False. So used to using groupby and reset_index:)

Collectives™ on Stack Overflow

Combine Pandas Data Frame where different columns are missing

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related