1

Currently i am having an question in python pandas. I want to filter a dataframe using url query string dynamically.

For eg: CSV:

enter image description here

url: http://example.com/filter?Name=Sam&Age=21&Gender=male

Hardcoded:

filtered_data = data[
    (data['Name'] == 'Sam') &
    (data['Age'] == 21) &
    (data['Gender'] == 'male') 
];

I don't want to hard code the filter keys like before because the csv file changes anytime with different column headers. Any suggestions

3 Answers 3

2

The easiest way to create this filter dynamically is probably to use np.all.

For example:

import numpy as np

query = {'Name': 'Sam', 'Age': 21, 'Gender': 'male'}
filters = [data[k] == v for k, v in query.items()]
filter_data = data[np.all(filters, axis=0)]
Sign up to request clarification or add additional context in comments.

Comments

2

use df.query. For example

df = pd.read_csv(url)
conditions = "Name == 'Sam' and Age == 21 and Gender == 'Male'"
filtered_data = df.query(conditions)

You can build the conditions string dynamically using string formatting like

conditions = " and ".join("{} == {}".format(col, val) 
                           for col, val in zip(df.columns, values)

Comments

1

Typically, your web framework will return the arguments in a dict-like structure. Let's say your args are like this:

args = {
    'Name': ['Sam'],
    'Age': ['21'],         # Note that Age is a string
    'Gender': ['male']
}

You can filter your dataset successively like this:

for key, values in args.items():
    data = data[data[key].isin(values)]

However, this is likely not to match any data for Age, which may have been loaded as an integer. In that case, you could load the CSV file as a string via pd.read_csv(filename, dtype=object), or convert to string before comparison:

for key, values in args.items():
    data = data[data[key].astype(str).isin(values)]

Incidentally, this will also match multiple values. For example, take the URL http://example.com/filter?Name=Sam&Name=Ben&Age=21&Gender=male -- which leads to the structure:

args = {
    'Name': ['Sam', 'Ben'],    # There are 2 names
    'Age': ['21'],
    'Gender': ['male']
}

In this case, both Ben and Sam will be matched, since we're using .isin to match.

5 Comments

A note, here would be, to read all data as strings, or know the type of each columns. eg Age
Is there any reason why you converted your answer to a community wiki?
Saved my time @S Anand :) Thank you. Filters are working as expected
@EdChum -- just in case anyone wanted to contribute further to the answer. Any guidelines I should follow on when to make something a community wiki?
Really wikis are for a different thing: meta.stackexchange.com/questions/55888/… others can suggest edits to your answer to improve it but your suggestion is not what the community wiki is for in my opinion

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.