0

The goal is to filter a DataFrame on a dynamic number of columns with their respective individual values. To achieve this, I've created a filter mask from a dictionary which I should be able to use each time.

However this filter mask becomes a string and therefore provides a 'KeyError'. Some example of how my logic works.

import pandas as pd

# Create a list of dictionaries with the data for each row
data = [{'col1': 1, 'col2': 'a', 'col3': True, 'col4': 1.0},
        {'col1': 2, 'col2': 'b', 'col3': False, 'col4': 2.0},
        {'col1': 1, 'col2': 'c', 'col3': True, 'col4': 3.0},
        {'col1': 2, 'col2': 'd', 'col3': False, 'col4': 4.0},
        {'col1': 1, 'col2': 'e', 'col3': True, 'col4': 5.0}]
df = pd.DataFrame(data)

filter_dict = {'col1': 1, 'col3': True,}

def create_filter_query_for_df(filter_dict):
    query = ""
    for i, (column, values) in enumerate(filter_dict.items()):
        if i > 0:
            query += " & "
        if isinstance(values,float) or isinstance(values,int):
            query += f"(data['{column}'] == {values})"
        else:
            query += f"(data['{column}'] == '{values}')"
    return query

df[create_filter_query_for_df(filter_dict)]

Result is:

KeyError: "(data['col1'] == 1) & (data['col3'] == True)"

The issue is that the create_filter_query_for_df() will return a string and it should be boolean variable. If you would make the mask as following:

mask1 = "(data['col1'] == 1) & (data['col3'] == True)" # the same error is returned;

# However if you format as below, it provides a success
mask2 = (data['col1'] == 1) & (data['col3'] == True)

The type of mask1 will be str. The type of mask2 will be boolean.

However, I can't use bool(mask1) because then I can't use it anymore as filter condition. I'm quite stuck so need some help here.

Apologies if I took a completely wrong approach in trying to get to the filter, it seemed quite a suitable solution to me.

Thanks in advance!

1 Answer 1

1

The result of filtering based on mask2 is as follows:

mask2 = (df['col1'] == 1) & (df['col3'] == True)
df[mask2]

   col1 col2  col3  col4
0     1    a  True   1.0
2     1    c  True   3.0
4     1    e  True   5.0

To reach the same result with a string, we can use df.query like so:

df.query('(col1 == 1) & (col3 == True)')

   col1 col2  col3  col4
0     1    a  True   1.0
2     1    c  True   3.0
4     1    e  True   5.0

Note that the required syntax is actually a bit different. So, let's simplify your function to end up with the string that we need:

def create_filter_query_for_df(filter_dict):
    list_pairs = [f"({col} == {val})" for col, val in filter_dict.items()]
    query = ' & '.join(list_pairs)
    
    # '(col1 == 1) & (col3 == True)'
    
    return query

df.query(create_filter_query_for_df(filter_dict))

   col1 col2  col3  col4
0     1    a  True   1.0
2     1    c  True   3.0
4     1    e  True   5.0

Alternative approach

Incidentially, if you are only using the & operator, another way to approach this problem could be as follows:

  • Use a list comprehension to create two pd.Series and use them as input for pd.concat with axis parameter set to 1.
  • Chain df.all with axis parameter again set to 1 to evaluate if all values for each row in the resulting temporary df equal True).
  • The result is a single pd.Series with booleans that we can use to filter the df.
my_mask = (pd.concat([df[k].eq(v) for k, v in filter_dict.items()], 
                     axis=1)
           .all(axis=1))

df[my_mask]

   col1 col2  col3  col4
0     1    a  True   1.0
2     1    c  True   3.0
4     1    e  True   5.0

Of course, this approach may not be ideal (or: function at all) if your actual requirements are a bit more complex.

Sign up to request clarification or add additional context in comments.

1 Comment

You are a king! thank you for the explanation. I made the following adjustment, since the value could be string or int/float '''list_pairs = [f"({col} == {repr(val)})" for col, val in filter_dict.items()]'''' that did the trick correctly for me :).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.