Identify missing date data in a Pandas DataFrame column

Question

I have a DataFrame that looks like this.

rnd_id Date     A  B  C  D
1    01/01/2020 2, 5, 8, 5
1    02/01/2020 4, 4, 3, 9
1    04/01/2020 2, 4, 8, 8
20   02/01/2020 3, 1, 2, 3
20   03/01/2020 6, 4, 4, 4
20   04/01/2020 5, 4, 3, 9
50   01/01/2020 6, 4, 2, 1
50   02/01/2020 8, 4, 3, 9
50   03/01/2020 3, 5, 5, 2
50   04/01/2020 2, 3, 3, 1

For a given rnd_id, it should have a row for every sequential date in a date range. What I want to be able to do is identify which rows of data are missing. So for date_range('2020-01-01', periods=4, freq='D'), it should return

rnd_id Date
1      03/01/2020
20     01/01/2020

I'm stuck because reindexing doesn't work because of the duplicate date data. Any ideas to help, please?

BENY · Accepted Answer · 2020-06-13 23:37:18Z

1

We can do reindex

s=pd.date_range('2020-01-01', periods=4, freq='D')
d=df.set_index(['rnd_id','Date']).reindex(pd.MultiIndex.from_product([df.rnd_id.unique(),s]))
d[d.isnull().any(1)].index.to_frame()
                0          1
1  2020-01-03   1 2020-01-03
20 2020-01-01  20 2020-01-01

answered Jun 13, 2020 at 23:37

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ron Over a year ago

Thanks, Yoben_S. I've tried that and it adds NaN for the missing dates, which is great, but d returns all values in the dataframe not the rows with NaN?

BENY Over a year ago

@Ron d=d[d.isnull().any(1)].index.to_frame() ? I just did not assign it ~

Ron Over a year ago

There you go - sorry, I got too excited about your solution. Works perfectly after assignment. Thank you

Collectives™ on Stack Overflow

Identify missing date data in a Pandas DataFrame column

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related