48

I want to filter a dataframe by a more complex function based on different values in the row.

Is there a possibility to filter DF rows by a boolean function like you can do it e.g. in ES6 filter function?

Extreme simplified example to illustrate the problem:

import pandas as pd

def filter_fn(row):
    if row['Name'] == 'Alisa' and row['Age'] > 24:
        return False

    return row

d = {
    'Name': ['Alisa', 'Bobby', 'jodha', 'jack', 'raghu', 'Cathrine',
             'Alisa', 'Bobby', 'kumar', 'Alisa', 'Alex', 'Cathrine'],
    'Age': [26, 24, 23, 22, 23, 24, 26, 24, 22, 23, 24, 24],

    'Score': [85, 63, 55, 74, 31, 77, 85, 63, 42, 62, 89, 77]}

df = pd.DataFrame(d, columns=['Name', 'Age', 'Score'])

df = df.apply(filter_fn, axis=1, broadcast=True)

I found something using apply() but this actually returns only False/True filled rows using a bool function, which is expected.

My workaround would be returning the row itself when the function result would be True and returning False if not. But this would require an additional filtering after that.

        Name    Age  Score
0      False  False  False
1      Bobby     24     63
2      jodha     23     55
3       jack     22     74
4      raghu     23     31
5   Cathrine     24     77
6      False  False  False
7      Bobby     24     63
8      kumar     22     42
9      Alisa     23     62
10      Alex     24     89
11  Cathrine     24     77

2 Answers 2

55

I think using functions here is unnecessary. It is better and mainly faster to use boolean indexing:

m = (df['Name'] == 'Alisa') & (df['Age'] > 24)
print(m)
0      True
1     False
2     False
3     False
4     False
5     False
6      True
7     False
8     False
9     False
10    False
11    False
dtype: bool

#invert mask by ~
df1 = df[~m]

For more complicated filtering, you could use a function which must return a boolean value:

def filter_fn(row):
    if row['Name'] == 'Alisa' and row['Age'] > 24:
        return False
    else:
        return True

df = pd.DataFrame(d, columns=['Name', 'Age', 'Score'])
m = df.apply(filter_fn, axis=1)
print(m)
0     False
1      True
2      True
3      True
4      True
5      True
6     False
7      True
8      True
9      True
10     True
11     True
dtype: bool

df1 = df[m]
Sign up to request clarification or add additional context in comments.

Comments

5

A very readable way to filter dataframes is query.

df.query("not (Name == 'Alisa' and Age > 24)")

# or pass the negation from the beginning (by de Morgan's laws)
df.query("Name != 'Alisa' or Age <= 24")

Another way is to pass the complicated function to loc to filter.

df.loc[lambda x: ~((x['Name'] == 'Alisa') & (x['Age'] > 24))]

res

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.