0

I am trying to filter a column X and get all the missing weeks data.

For example, a df (sample date, where in actual df we will have whole week's data):

mean_date         column X  
2021-04-01        x_123
2021-04-01        y_324
2021-04-02        x_123
2021-04-03        x_123

I need to find all the missing dates for a corresponding column X in the current week, i.e.

result_df:

mean_date_missing     column_X
2021-03-28            x_123
2021-03-29            x_123  
2021-03-30            x_123
2021-03-31            x_123
..
2021-03-28            y_324
2021-03-29            y_324  
2021-03-30            y_324
2021-03-31            y_324
2021-04-02            y_324
2021-04-03            y_324
3
  • This question has been answered previously: stackoverflow.com/questions/30447083/… Commented Apr 14, 2021 at 18:24
  • This question has been answered before: stackoverflow.com/questions/30447083/… Commented Apr 14, 2021 at 18:25
  • @a_g no, that does not use any specific column, neither does it return the data in the desired output Commented Apr 14, 2021 at 18:30

1 Answer 1

1

Probably not the most elegant way to do this, but I think this could work:

At first, I'd get a list of all dates of that week, let's call it weekdays. You can either do it manually or write a function for it (you should find solutions for that if you google). So, you now have something like this:

weekdays = ["2021-04-01", "2021-04-02", "2021-04-03", "2021-04-04", "2021-04-05"]

Next, I'd group the DataFrame by column X since you want to find the missing days for each possible value of that column respectively.

grouped = df.groupby(df["column x"])    

Then, iterate over it to find the missing dates per group:

missing_list = []
for key, item in grouped:
    existing_dates = item["mean_date"].to_list()
    missing_dates = np.setdiff1d(weekdays, existing_dates)
    for date in missing_dates:
        missing_list.append([date, key])

Now, all necessary information is stored in missing_list. You just need to make a DataFrame from this:

result_df = pd.DataFrame(missing_list, columns=["mean_date_missing", "column x"])
Sign up to request clarification or add additional context in comments.

1 Comment

how would I put in the result_df?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.