2

I have a DataFrame similar to this one below:

    Dt_Customer Recency
0   2012-09-04  58
1   2014-03-08  94
2   2013-08-21  26
3   2014-02-10  26
4   2014-01-19  94

I want to slice it based on a 'Recency' condition and get the latest date, that would return this:

    Dt_Customer Recency
1   2014-03-08  94

I've tried this:

df.loc[df['Recency'] == 94 | df['Dt_Customer'].max()]

But I've got this error:

TypeError: unsupported operand type(s) for |: 'int' and 'str'

Could you guys enlighten me? I'm still learning these pandas features, so any help would be appreciated. The original DataFrame is bigger than this.

Thanks

6
  • You're comparing a string to an integer. Commented May 11, 2020 at 2:23
  • @Johnny That I understand, but there is a way to make this kind of slice? Commented May 11, 2020 at 2:30
  • compare the month to the maximum date : df.loc[(df['Recency'] == 94) | (df['Dt_Customer'] == df['Dt_Customer'].max())]. that gives u an OR scenario. it looks as though u r after and and , in which case, u should swap the | with ```&````. The brackets ensure each condition is evaluated separately Commented May 11, 2020 at 2:31
  • @sammywemmy I should have used the '&' operator as it was what I'm looking for. Howerver using the changes will suggested, it returns a df but without any data in it, only the columns. Any thoughts? Commented May 11, 2020 at 2:40
  • did u include the parentheses? i edited my comments : df.loc[(df['Recency'] == 94) | (df['Dt_Customer'] == df['Dt_Customer'].max())] Commented May 11, 2020 at 2:43

2 Answers 2

1

IIUC you can use .drop_duplicates and sort_values with loc to get your desired dataframe.

we sort based on date and keep the last value which will be the max date based on the desired Recency.

df2 = df.sort_values('Dt_Customer')\
                     .drop_duplicates(subset=['Recency'],keep='last')\
                     .loc[df['Recency'].eq(94)]

print(df2)

  Dt_Customer  Recency
1  2014-03-08       94

or you could use groupby

df.groupby(['Recency'],as_index=False)['Dt_Customer'].max()\
                                     .query('Recency == 94')

   Recency Dt_Customer
2       94  2014-03-08

or you could chain a boolean filter with a .query

df[df['Recency'] == 94].query('Dt_Customer == Dt_Customer.max()')

  Dt_Customer  Recency
1  2014-03-08       94
Sign up to request clarification or add additional context in comments.

4 Comments

sorting values is not scalable, it will work fine on small DFs, but will consume loads of memory on larger DF.
@CarlosPCeballos perhaps, but the question didn't specify any performance constrains. Anyway, I've added a few variants that will run faster.
@Datanovice I thought about groupby, however I didn't know that I could use query together. As Carlos mentioned, for now it would not be a problem memory consume, but is nice to know both methods for when I need to work in larger data sets. Thanks
@GustavoRottgering no problemo :) happy coding.
1

It's easier to understand and more readable if you do it in two steps, it should be just as fast

df = df.loc[df['Recency'] == 94]
df['Dt_Customer'].max()]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.