Drop dates based on condition in python

Question

I'm trying to implement a condition where if the count of incorrect values is greater than 2 (2019-05-17 & 2019-05-20 in the example below) then the complete date (all the time blocks) is removed

Input

                    t_value C/IC
2019-05-17 00:00:00   0     incorrect
2019-05-17 01:00:00   0     incorrect 
2019-05-17 02:00:00   0     incorrect 
2019-05-17 03:00:00   4     correct
2019-05-17 04:00:00   5     correct 
2019-05-18 01:00:00   0     incorrect   
2019-05-18 02:00:00   6     correct  
2019-05-18 03:00:00   7     correct 
2019-05-19 04:00:00   0     incorrect
2019-05-19 09:00:00   0    incorrect 
2019-05-19 11:00:00   8    correct
2019-05-20 07:00:00   2    correct
2019-05-20 08:00:00   0    incorrect
2019-05-20 09:00:00   0    incorrect
2019-05-20 07:00:00   0    incorrect

Desired Output

                    t_value C/IC 
2019-05-18 01:00:00   0     incorrect   
2019-05-18 02:00:00   6     correct  
2019-05-18 03:00:00   7     correct 
2019-05-19 04:00:00   0     incorrect
2019-05-19 09:00:00   0    incorrect 
2019-05-19 11:00:00   8    correct

I'm not sure which time based operation to perform to get the desired result. Thanks

Seems like all you need is records with datetime between 2019-05-17 04:00:00 and 2019-05-19 11:00:00. Pandas.Timestamp() allows you to compare the dates with simple >, <, == operations. — Aramakus
– Aramakus, Commented May 18, 2020 at 4:38
Yes, in this example. But overall, I'm concerned with removing the date where the corresponding count of incorrect values is greater than 2. — sklal
– sklal, Commented May 18, 2020 at 4:46

sammywemmy · Accepted Answer · 2020-05-19 02:50:10Z

#read in data
df = pd.read_csv(StringIO(data),sep='\s{2,}', engine='python')

#give index a name 
df.index.name = 'Date'
#convert to datetime 
#and sort index
#usually safer to sort datetime index in Pandas
df.index = pd.to_datetime(df.index)
df = df.sort_index()

res = (df
       #group by date and c/ic
       .groupby([pd.Grouper(freq='1D',level='Date'),"C/IC"])
       .size()
       #get rows greater than 2 and incorrect
       .loc[lambda x: x>2,"incorrect"]
       #keep only the date index
       .droplevel(-1)
       .index
       #datetime information trapped here
       #and due to grouping, it is different from initial datetime
       #as such, we convert to string 
       #and build another batch of dates
       .astype(str)
       .tolist()
      )

res
['2019-05-17', '2019-05-20']

#build a numpy array of dates
idx = np.array(res, dtype='datetime64')

#exclude dates in idx and get final value
#aim is to get dates, irrespective of time

df.loc[~np.isin(df.index.date,idx)]

                     t_value    C/IC
Date        
2019-05-18 01:00:00     0   incorrect
2019-05-18 02:00:00     6   correct
2019-05-18 03:00:00     7   correct
2019-05-19 04:00:00     0   incorrect
2019-05-19 09:00:00     0   incorrect
2019-05-19 11:00:00     8   correct

xcmkz · Accepted Answer · 2020-05-18 17:36:10Z

0

Misunderstood the question, sorry.

Updated answer: you can find the dates to be removed by the following:

df['_date'] = df.index.dt.date
incorrect_df = df[df['C/IC'] == 'incorrect']
incorrect_count = incorrect_df['C/IC'].groupby(by='_date').count()
dates_to_remove = set(incorrect_count[incorrect_count > 2]['_date'])
    # using set to make the later step more efficient if the df is long

Then mask the dataframe accordingly:

mask = [x not in dates_to_remove for x in df['_date']
res = df[mask]

edited May 18, 2020 at 17:36

answered May 18, 2020 at 5:04

xcmkz

6964 silver badges15 bronze badges

2 Comments

sklal Over a year ago

Thanks for responding. I don't think this would remove the date with all the time blocks.

xcmkz Over a year ago

Yeah sorry I missed that. You can use df.index.dt.date first to take the dates only and save it to a separate column. The answer is now updated.

Collectives™ on Stack Overflow

Drop dates based on condition in python

2 Answers 2

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related