use numpy for query data

Question

I have used pandas and sqlite to perform multiple conditional search in dataframe, such like:

name    age height
0   john    18  178
1   jen     25  168

age > 20 & height < 170 & height > 150

I am wondering if numpy can do the same thing, if it can, will it be faster than pandas and sqlite?

Thanks

Corralien · Accepted Answer · 2021-08-06 22:19:11Z

1

Yes, numpy can do the same maybe and faster than pandas:

df = pd.DataFrame({'name': {0: 'john', 1: 'jen'},
                   'age': {0: 18, 1: 25},
                   'height': {0: 178, 1: 168}})
print((df['age'] > 20) & (df['height'] < 170) & (df['height'] > 150))
0    False
1     True
dtype: bool


m = df.values.T  # Note the transposition
print((m[1] > 20) & (m[2] < 170) & (m[2] > 150))
array([False,  True])

Performance

>>> %timeit (df['age'] > 20) & (df['height'] < 170) & (df['height'] > 150)
392 µs ± 1.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

>>> %timeit (m[1] > 20) & (m[2] < 170) & (m[2] > 150)
6.69 µs ± 12.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

answered Aug 6, 2021 at 22:19

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

mike Over a year ago

thanks! I supposed the 'df.values.T' is using numpy? I though it will be something like 'numpy.' ....

mike Over a year ago

is numpy.where() similar to pandas.query()?

Corralien Over a year ago

df.values.T convert your dataframe to ndarray and transpose the result. np.where and pd.query are not the same. Refer to the documentation.

Collectives™ on Stack Overflow

use numpy for query data

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related