1

I have a pandas data frame that looks like this:

asset, cusip, information1, information2, ...., information_n
1x4,   43942,    45,       ,  NaN,    ,    , NaN
1x4,   43942,    NaN,      ,  "hello",     , NaN
1x4,   43942,    NaN,      ,  NaN,     , "goodbye"
...

What I want is:

asset, cusip, information1, information2, ...., information_n
1x4,   43942,    45,       , "hello",    ,    , "goodbye"
...

Essentially I want to collapse down over matching "assets" and "cusips" regardless of the fields. There will be only one entry that's not NAN in information1...information_n.

Note that some columns might be int, some strings, others floats, etc.

1 Answer 1

3

You can use groupby and first() which gives you first and in your case only non-NaN value

df = df.groupby(['asset', 'cusip']).first().reset_index()


    asset   cusip   information1    information2    information_n
0   1x4     43942   45              "hello"         "goodbye"
Sign up to request clarification or add additional context in comments.

2 Comments

Very nice answer! Equivalently, you can pass the as_index=False parameter and forgo the reset_index: df.groupby(['asset', 'cusip'], as_index=False).first()
@piRSquared, thank you! I need to remember as_index = False. So used to using groupby and reset_index:)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.