1

I have a DataFrame that looks like this.

rnd_id Date     A  B  C  D
1    01/01/2020 2, 5, 8, 5
1    02/01/2020 4, 4, 3, 9
1    04/01/2020 2, 4, 8, 8
20   02/01/2020 3, 1, 2, 3
20   03/01/2020 6, 4, 4, 4
20   04/01/2020 5, 4, 3, 9
50   01/01/2020 6, 4, 2, 1
50   02/01/2020 8, 4, 3, 9
50   03/01/2020 3, 5, 5, 2
50   04/01/2020 2, 3, 3, 1

For a given rnd_id, it should have a row for every sequential date in a date range. What I want to be able to do is identify which rows of data are missing. So for date_range('2020-01-01', periods=4, freq='D'), it should return

rnd_id Date
1      03/01/2020
20     01/01/2020

I'm stuck because reindexing doesn't work because of the duplicate date data. Any ideas to help, please?

1 Answer 1

1

We can do reindex

s=pd.date_range('2020-01-01', periods=4, freq='D')
d=df.set_index(['rnd_id','Date']).reindex(pd.MultiIndex.from_product([df.rnd_id.unique(),s]))
d[d.isnull().any(1)].index.to_frame()
                0          1
1  2020-01-03   1 2020-01-03
20 2020-01-01  20 2020-01-01
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, Yoben_S. I've tried that and it adds NaN for the missing dates, which is great, but d returns all values in the dataframe not the rows with NaN?
@Ron d=d[d.isnull().any(1)].index.to_frame() ? I just did not assign it ~
There you go - sorry, I got too excited about your solution. Works perfectly after assignment. Thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.