1

I have used pandas and sqlite to perform multiple conditional search in dataframe, such like:

name    age height
0   john    18  178
1   jen     25  168

age > 20 & height < 170 & height > 150

I am wondering if numpy can do the same thing, if it can, will it be faster than pandas and sqlite?

Thanks

1 Answer 1

1

Yes, numpy can do the same maybe and faster than pandas:

df = pd.DataFrame({'name': {0: 'john', 1: 'jen'},
                   'age': {0: 18, 1: 25},
                   'height': {0: 178, 1: 168}})
print((df['age'] > 20) & (df['height'] < 170) & (df['height'] > 150))
0    False
1     True
dtype: bool


m = df.values.T  # Note the transposition
print((m[1] > 20) & (m[2] < 170) & (m[2] > 150))
array([False,  True])

Performance

>>> %timeit (df['age'] > 20) & (df['height'] < 170) & (df['height'] > 150)
392 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> %timeit (m[1] > 20) & (m[2] < 170) & (m[2] > 150)
6.69 µs ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Sign up to request clarification or add additional context in comments.

3 Comments

thanks! I supposed the 'df.values.T' is using numpy? I though it will be something like 'numpy.' ....
is numpy.where() similar to pandas.query()?
df.values.T convert your dataframe to ndarray and transpose the result. np.where and pd.query are not the same. Refer to the documentation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.