3

I have a method which takes a pandas dataframe as an input:

def dfColumnFilter(df, columnFilter, columnName):
    ''' Returns a filtered DataFrame

    Keyword arguments: 
    df           :  DataFrame in which to apply the filter
    columnFilter :  The list of which to filter by
    columnName   :  The DataFrame column to apply the columnFilter to '''

    for column_filter in columnFilter:
        df=df[df[columnName] == column_filter]
        return df

The question is is how do I make this work for n columns?

0

2 Answers 2

3

You can use the *args keyword to pass a list of pairs:

def filter_df(df, *args):
    for k, v in args:
        df = df[df[k] == v]
    return df

It can be used like this:

df = pd.DataFrame({'a': [1, 2, 1, 1], 'b': [1, 3, 3, 3]})

>>> filter_df(df, ('a', 1), ('b', 2))
    a   b
2   1   3
3   1   3

Note

In theory, you could use **kwargs, which would have a more pleasing usage:

filter_df(df, a=1, b=2)

but then you could only use it for columns whose names are valid Python identifiers.

Edit

See comment below by @Goyo for a better implementation point.

Sign up to request clarification or add additional context in comments.

1 Comment

I think you can use the dictionary syntax for invalid identifiers:<br> df = pd.DataFrame({'first one': [1, 2, 1, 1], 'second one': [1, 3, 3, 3]})<br> filter_df(df, {'first one'=1, 'second one'=2})
1

You can use as below

filtered_df = df[(df[column1]=='foo') & (df[column2]=='bar')]

and you can continue with & and parentesis statements.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.