pandas filter rows based on missing dates for a column

Question

I am trying to filter a column X and get all the missing weeks data.

For example, a df (sample date, where in actual df we will have whole week's data):

mean_date         column X  
2021-04-01        x_123
2021-04-01        y_324
2021-04-02        x_123
2021-04-03        x_123

I need to find all the missing dates for a corresponding column X in the current week, i.e.

result_df:

mean_date_missing     column_X
2021-03-28            x_123
2021-03-29            x_123  
2021-03-30            x_123
2021-03-31            x_123
..
2021-03-28            y_324
2021-03-29            y_324  
2021-03-30            y_324
2021-03-31            y_324
2021-04-02            y_324
2021-04-03            y_324

This question has been answered previously: stackoverflow.com/questions/30447083/… — a_g
– a_g, Commented Apr 14, 2021 at 18:24
This question has been answered before: stackoverflow.com/questions/30447083/… — a_g
– a_g, Commented Apr 14, 2021 at 18:25
@a_g no, that does not use any specific column, neither does it return the data in the desired output — Krunch Man
– Krunch Man, Commented Apr 14, 2021 at 18:30

cosmoem · Accepted Answer · 2021-04-14 19:03:32Z

1

Probably not the most elegant way to do this, but I think this could work:

At first, I'd get a list of all dates of that week, let's call it weekdays. You can either do it manually or write a function for it (you should find solutions for that if you google). So, you now have something like this:

weekdays = ["2021-04-01", "2021-04-02", "2021-04-03", "2021-04-04", "2021-04-05"]

Next, I'd group the DataFrame by column X since you want to find the missing days for each possible value of that column respectively.

grouped = df.groupby(df["column x"])

Then, iterate over it to find the missing dates per group:

missing_list = []
for key, item in grouped:
    existing_dates = item["mean_date"].to_list()
    missing_dates = np.setdiff1d(weekdays, existing_dates)
    for date in missing_dates:
        missing_list.append([date, key])

Now, all necessary information is stored in missing_list. You just need to make a DataFrame from this:

result_df = pd.DataFrame(missing_list, columns=["mean_date_missing", "column x"])

edited Apr 14, 2021 at 19:03

answered Apr 14, 2021 at 18:51

cosmoem

365 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Krunch Man Over a year ago

how would I put in the result_df?

Collectives™ on Stack Overflow

pandas filter rows based on missing dates for a column

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related