Filtering rows in a DataFrame when condition is rows in another DataFrame

Question

I have a huge timeseries DataFrame (about 100 000 000 rows) and I need to filter rows by conditions. Each condition there are in an each row of an another Dataframe. This Dataframe has about 2000 rows and each row is condition. Toy example:

df = pd.DataFrame({val: [1, 3, 2, 4, 3, 1, 2, 3], date: [2015-03-12, 2015-04-12, 2015-05-13, 2016-03-12, 2016-04-07, 2016-05-12, 2017-01-11, 2017-03-20]})
df_condition = pd.DataFrame({val: [2, 3], date: [2015-07-13, 2016-04-08]})

Condition is remove all raws in df, where val appears earlier than date in df_condition:

df = df[(df['val']==2) & (df['date']>'2015-07-13')]
df = df[(df['val']==3) & (df['date']>'2016-04-08')]

and so on about 2000 conditions

I use this way, but it too long (about 5 hours). Is there a faster method?

vals = df_condition.val.values
dates = vals = df_condition.dates.values
for i in range(len(df_condition)):
    df = df[~((df.val==vals[i])&(df.date < dates[i]))]

jezrael · Accepted Answer · 2017-12-22 09:29:15Z

2

I believe you can create list of all masks and then reduce:

masks = [(df.val==x.val)&(df.date >= x.date) for x in df_condition.itertuples()]
print (masks)
[0    False
1    False
2    False
3    False
4    False
5    False
6     True
7    False
dtype: bool, 0    False
1    False
2    False
3    False
4    False
5    False
6    False
7     True
dtype: bool]

df1 = df[np.logical_and.reduce(masks)]

answered Dec 22, 2017 at 9:29

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Dmitrii Petukhov Over a year ago

Thanks for so elegant and fast way!

Collectives™ on Stack Overflow

Filtering rows in a DataFrame when condition is rows in another DataFrame

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related