0

I have the data frame

Date         CUSIP   Asset   Liability
01-01-1990     A       1        NaN
01-01-1990     A      Nan        2
02-01-1990     A       3         2
01-01-1990     B      Nan        2
01-01-1990     B       1         2

Is there anyway of combining this such that it becomes:

Date         CUSIP   Asset   Liability
01-01-1990     A       1         2
02-01-1990     A       3         2
01-01-1990     B       1         2

The way I came up with is to use groupby(["CUSIP", Date]).agg(function)

where I apply a function where the max(nan, 3) = 3.

Is there a simpler way?

1 Answer 1

1
>>> df.groupby(['Date', 'CUSIP']).apply(lambda group: group.ffill().bfill()).drop_duplicates()
         Date CUSIP Asset  Liability
0  01-01-1990     A     1          2
2  02-01-1990     A     3          2
3  01-01-1990     B     1          2
Sign up to request clarification or add additional context in comments.

5 Comments

yes! the problem with implementing a max with nan method is that this is stupidly slow. One needs to loop through a list and discard nan values. I sometimes hates the way python treats nan...
To ensure there are no errors in your data, you can also ensure that there is only one CUSIP on any given date. Assuming the result above is called result, then result.groupby(['Date', 'CUSIP'])['CUSIP'].count().max() should return 1.
thanks for the tip. I am running this. this is very slow. are forward fill and backward fill always this slow?
What is the output of df.info()? Are the columns Asset and Liability floats? And what is the type of the Date column? Also, drop_duplicates may be slowing it down. Try without it and see if there is an improvement. If so, and you've checked your data to ensure there are no dups per my comment above, you can group again. df.groupby(['Date', 'CUSIP']).apply(lambda group: group.ffill().bfill()).groupby(['Date', 'CUSIP']).first()
RangeIndex: 501896 entries, 0 to 501895 Data columns (total 4 columns): Date 501896 non-null datetime64[ns] CUSIP 501372 non-null object Asset 386228 non-null float64 Liability 385416 non-null float64 dtypes: datetime64[ns](1), float64(2), object(1)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.