Slicing using multiple conditions with date

Question

I have a DataFrame similar to this one below:

    Dt_Customer Recency
0   2012-09-04  58
1   2014-03-08  94
2   2013-08-21  26
3   2014-02-10  26
4   2014-01-19  94

I want to slice it based on a 'Recency' condition and get the latest date, that would return this:

    Dt_Customer Recency
1   2014-03-08  94

I've tried this:

df.loc[df['Recency'] == 94 | df['Dt_Customer'].max()]

But I've got this error:

TypeError: unsupported operand type(s) for |: 'int' and 'str'

Could you guys enlighten me? I'm still learning these pandas features, so any help would be appreciated. The original DataFrame is bigger than this.

Thanks

@Johnny That I understand, but there is a way to make this kind of slice? — Gustavo Rottgering
– Gustavo Rottgering, Commented May 11, 2020 at 2:30
compare the month to the maximum date : df.loc[(df['Recency'] == 94) | (df['Dt_Customer'] == df['Dt_Customer'].max())]. that gives u an OR scenario. it looks as though u r after and and , in which case, u should swap the | with ```&````. The brackets ensure each condition is evaluated separately — sammywemmy
– sammywemmy, Commented May 11, 2020 at 2:31
@sammywemmy I should have used the '&' operator as it was what I'm looking for. Howerver using the changes will suggested, it returns a df but without any data in it, only the columns. Any thoughts? — Gustavo Rottgering
– Gustavo Rottgering, Commented May 11, 2020 at 2:40
did u include the parentheses? i edited my comments : df.loc[(df['Recency'] == 94) | (df['Dt_Customer'] == df['Dt_Customer'].max())] — sammywemmy
– sammywemmy, Commented May 11, 2020 at 2:43

Umar.H · Accepted Answer · 2020-05-11 03:16:14Z

1

IIUC you can use .drop_duplicates and sort_values with loc to get your desired dataframe.

we sort based on date and keep the last value which will be the max date based on the desired Recency.

df2 = df.sort_values('Dt_Customer')\
                     .drop_duplicates(subset=['Recency'],keep='last')\
                     .loc[df['Recency'].eq(94)]

print(df2)

  Dt_Customer  Recency
1  2014-03-08       94

or you could use groupby

df.groupby(['Recency'],as_index=False)['Dt_Customer'].max()\
                                     .query('Recency == 94')

   Recency Dt_Customer
2       94  2014-03-08

or you could chain a boolean filter with a .query

df[df['Recency'] == 94].query('Dt_Customer == Dt_Customer.max()')

  Dt_Customer  Recency
1  2014-03-08       94

edited May 11, 2020 at 3:16

answered May 11, 2020 at 3:05

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Carlos P Ceballos Over a year ago

sorting values is not scalable, it will work fine on small DFs, but will consume loads of memory on larger DF.

Umar.H Over a year ago

@CarlosPCeballos perhaps, but the question didn't specify any performance constrains. Anyway, I've added a few variants that will run faster.

Gustavo Rottgering Over a year ago

@Datanovice I thought about groupby, however I didn't know that I could use query together. As Carlos mentioned, for now it would not be a problem memory consume, but is nice to know both methods for when I need to work in larger data sets. Thanks

Umar.H Over a year ago

@GustavoRottgering no problemo :) happy coding.

Carlos P Ceballos · Accepted Answer · 2020-05-11 03:01:53Z

1

It's easier to understand and more readable if you do it in two steps, it should be just as fast

df = df.loc[df['Recency'] == 94]
df['Dt_Customer'].max()]

answered May 11, 2020 at 3:01

Carlos P Ceballos

4321 gold badge8 silver badges21 bronze badges

Collectives™ on Stack Overflow

Slicing using multiple conditions with date

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related