The goal is to filter a DataFrame on a dynamic number of columns with their respective individual values. To achieve this, I've created a filter mask from a dictionary which I should be able to use each time.
However this filter mask becomes a string and therefore provides a 'KeyError'. Some example of how my logic works.
import pandas as pd
# Create a list of dictionaries with the data for each row
data = [{'col1': 1, 'col2': 'a', 'col3': True, 'col4': 1.0},
{'col1': 2, 'col2': 'b', 'col3': False, 'col4': 2.0},
{'col1': 1, 'col2': 'c', 'col3': True, 'col4': 3.0},
{'col1': 2, 'col2': 'd', 'col3': False, 'col4': 4.0},
{'col1': 1, 'col2': 'e', 'col3': True, 'col4': 5.0}]
df = pd.DataFrame(data)
filter_dict = {'col1': 1, 'col3': True,}
def create_filter_query_for_df(filter_dict):
query = ""
for i, (column, values) in enumerate(filter_dict.items()):
if i > 0:
query += " & "
if isinstance(values,float) or isinstance(values,int):
query += f"(data['{column}'] == {values})"
else:
query += f"(data['{column}'] == '{values}')"
return query
df[create_filter_query_for_df(filter_dict)]
Result is:
KeyError: "(data['col1'] == 1) & (data['col3'] == True)"
The issue is that the create_filter_query_for_df() will return a string and it should be boolean variable. If you would make the mask as following:
mask1 = "(data['col1'] == 1) & (data['col3'] == True)" # the same error is returned;
# However if you format as below, it provides a success
mask2 = (data['col1'] == 1) & (data['col3'] == True)
The type of mask1 will be str. The type of mask2 will be boolean.
However, I can't use bool(mask1) because then I can't use it anymore as filter condition. I'm quite stuck so need some help here.
Apologies if I took a completely wrong approach in trying to get to the filter, it seemed quite a suitable solution to me.
Thanks in advance!