2

Is there a way to use a methods for Boolean indexing in pandas DataFrame?

For example:

import pandas


def filter_func(v) -> bool:
    return v == 'asd'


def main():
    df_test = pandas.DataFrame(
        [
            ['sd'], ['asd'], ['sdf']
        ],
        columns=["col-a"]
    )
    #### ERROR: This next line calls filter_func with all contents of column 'col-a'
    result = df_test[df_test['col-a'] == filter_func(df_test['col-a'])]


if __name__ == '__main__':
    main()

In the example above I want to keep only those values for which filter_func will return True. And a result should contain dataframe with single row, but instead I'm getting empty dataframe.

I understand that instead of executing filter_func for each row it is executed only once.

Is there a way to call it for each row?

Should I use apply or map for Series in this case?

Or is there any other way?

0

1 Answer 1

1
  • df_test['col-a'] is being filtered by the function, so only [filter_func(df_test['col-a'])] is needed, not [df_test['col-a'] == filter_func(df_test['col-a'])].
  • pandas: Boolean Indexing
import pandas as pd
import numpy as np
import random

# sample data
np.random.seed(365)
random.seed(365)
rows = 1100
data = {'a': np.random.randint(10, size=(rows)),
        'groups': [random.choice(['1-5', '6-25', '26-100', '100-500', '500-1000', '>1000']) for _ in range(rows)],
        'treatment': [random.choice(['Yes', 'No']) for _ in range(rows)],
        'date': pd.bdate_range(datetime.today(), freq='h', periods=rows).tolist()}
df = pd.DataFrame(data)

   a  groups treatment                date
0  2   >1000       Yes 2020-10-06 00:00:00
1  4  26-100        No 2020-10-06 01:00:00
2  1   >1000       Yes 2020-10-06 02:00:00
3  5    6-25       Yes 2020-10-06 03:00:00
4  2  26-100        No 2020-10-06 04:00:00

# filter function
def filter_func(v) -> bool:
    return v == '26-100'


# call function
filtered = df[filter_func(df.groups)]

# display(filtered)
    a  groups treatment                date
1   4  26-100        No 2020-10-06 01:00:00
4   2  26-100        No 2020-10-06 04:00:00
21  2  26-100       Yes 2020-10-06 21:00:00
24  9  26-100       Yes 2020-10-07 00:00:00
32  5  26-100        No 2020-10-07 08:00:00
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.