3

I have just read and drooled with excitement over these newly found optimization functions for my Pandas related needs. According to this book :

The DataFrame.eval() method allows much more succinct evaluation of expressions with the columns:

result3 = df.eval('(A + B) / (C - 1)') 
np.allclose(result1, result3)

True

To my example :

My dataframe contains around 42000 records and 28 columns. Two of which are Date and Heure which are strings.

My goal : to concatenate both columns into one. Which I can easily do with this piece of code : df_exade_light["Date"]+df_exade_light["Heure"], applying a %timeit on it returns

6.07 ms ± 219 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

But for some reason df.eval('Date + Heure') returns a :

RecursionError: maximum recursion depth exceeded

What's more, I apply the solution found in this thread to raise the allowed stack depth, but the kernel just crashes.

What's the reason for this? Am I doing something wrong?


The problem can be reproduce with this code:

import pandas as pd

df = pd.DataFrame({'A': ['X','Y'],
                   'B': ['U','V']})

df.eval('A+B')

1 Answer 1

3

The problem in your reproductible example is that you have string. In the link you give about High-Performance Pandas: eval() and query(), all examples are with float (or int).

One way to make it work with your example, is by using python as engine:

df.eval('A+B',engine='python')

By default, the engine used in eval is 'numexpr' according to the documentation and this engine use the library of the same name NumExpr, which is a Fast numerical expression evaluator for NumPy. Although in the previous link, an example with string is presented, it is not with the operation +. If you do df.eval('A==B') it works, same with other comparison operators, but not df.eval('A+B'). You can find more information there but for string, beside using engine='python' it seems limited.

Going back to your original problem with date and time type, not sure you can find a solution with the default engine (see here for supported datatype)

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you :) I did a %timeit df_exade_light.eval('Date+Heure',engine='python') and the result is 6.94 ms ± 309 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) so indeed it isn't faster

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.