1

Hi i want to Filter a dataframe from arguments dynamically.

this is my idea now:

tr=pd.read_csv("sales.csv")

def filtr(*arg2):
    fltr = tr.loc[(tr[arg2[0]] arg2[1] arg2[2]) arg2[3] ....]
    print(fltr)
    
filtr(*sys.argv[1:])

## python test.py "Unit Cost" "==" 4 & .......

i had the idea of making the (tr[arg2[0]] arg2[1] arg2[2]) as body and iterating it but i don't know how.

edit: Data Example:

{'Region': {0: 'Sub-Saharan Africa', 1: 'Europe', 2: 'Middle East and North Africa', 3: 'Sub-Saharan Africa', 4: 'Europe', 5: 'Sub-Saharan Africa', 6: 'Asia', 7: 'Asia', 8: 'Sub-Saharan Africa', 9: 'Central America and the Caribbean', 10: 'Sub-Saharan Africa', 11: 'Europe', 12: 'Europe', 13: 'Asia', 14: 'Middle East and North Africa', 15: 'Australia and Oceania', 16: 'Central America and the Caribbean', 17: 'Europe', 18: 'Middle East and North Africa', 19: 'Europe'}, 'Country': {0: 'Chad', 1: 'Latvia', 2: 'Pakistan', 3: 'Democratic Republic of the Congo', 4: 'Czech Republic', 5: 'South Africa', 6: 'Laos', 7: 'China', 8: 'Eritrea', 9: 'Haiti', 10: 'Zambia', 11: 'Bosnia and Herzegovina', 12: 'Germany', 13: 'India', 14: 'Algeria', 15: 'Palau', 16: 'Cuba', 17: 'Vatican City', 18: 'Lebanon', 19: 'Lithuania'}, 'Item Type': {0: 'Office Supplies', 1: 'Beverages', 2: 'Vegetables', 3: 'Household', 4: 'Beverages', 5: 'Beverages', 6: 'Vegetables', 7: 'Baby Food', 8: 'Meat', 9: 'Office Supplies', 10: 'Cereal', 11: 'Baby Food', 12: 'Office Supplies', 13: 'Household', 14: 'Clothes', 15: 'Snacks', 16: 'Beverages', 17: 'Beverages', 18: 'Personal Care', 19: 'Snacks'}, 'Sales Channel': {0: 'Online', 1: 'Online', 2: 'Offline', 3: 'Online', 4: 'Online', 5: 'Offline', 6: 'Online', 7: 'Online', 8: 'Online', 9: 'Online', 10: 'Offline', 11: 'Offline', 12: 'Online', 13: 'Online', 14: 'Offline', 15: 'Offline', 16: 'Online', 17: 'Online', 18: 'Offline', 19: 'Offline'}, 'Order Priority': {0: 'L', 1: 'C', 2: 'C', 3: 'C', 4: 'C', 5: 'H', 6: 'L', 7: 'C', 8: 'L', 9: 'C', 10: 'M', 11: 'M', 12: 'C', 13: 'C', 14: 'C', 15: 'L', 16: 'H', 17: 'L', 18: 'H', 19: 'H'}, 'Order Date': {0: '1/27/2011', 1: '12/28/2015', 2: '1/13/2011', 3: '9/11/2012', 4: '10/27/2015', 5: '7/10/2012', 6: '2/20/2011', 7: '4/10/2017', 8: '11/21/2014', 9: '7/4/2015', 10: '7/26/2016', 11: '10/20/2012', 12: '2/22/2015', 13: '8/27/2016', 14: '6/21/2011', 15: '9/19/2013', 16: '11/15/2015', 17: '4/6/2015', 18: '4/12/2010', 19: '9/26/2011'}, 'Order ID': {0: 292494523, 1: 361825549, 2: 141515767, 3: 500364005, 4: 127481591, 5: 482292354, 6: 844532620, 7: 564251220, 8: 411809480, 9: 327881228, 10: 773452794, 11: 479823005, 12: 498603188, 13: 151717174, 14: 181401288, 15: 500204360, 16: 640987718, 17: 206925189, 18: 221503102, 19: 878520286}, 'Ship Date': {0: '2/12/2011', 1: '1/23/2016', 2: '2/1/2011', 3: '10/6/2012', 4: '12/5/2015', 5: '8/21/2012', 6: '3/20/2011', 7: '5/12/2017', 8: '1/10/2015', 9: '7/20/2015', 10: '8/24/2016', 11: '11/15/2012', 12: '2/27/2015', 13: '9/2/2016', 14: '7/21/2011', 15: '10/4/2013', 16: '11/30/2015', 17: '4/27/2015', 18: '5/19/2010', 19: '10/2/2011'}, 'Units Sold': {0: 4484, 1: 1075, 2: 6515, 3: 7683, 4: 3491, 5: 9880, 6: 4825, 7: 3330, 8: 2431, 9: 6197, 10: 724, 11: 9145, 12: 6618, 13: 5338, 14: 9527, 15: 441, 16: 1365, 17: 2617, 18: 6545, 19: 2530}, 'Unit Price': {0: 651.21, 1: 47.45, 2: 154.06, 3: 668.27, 4: 47.45, 5: 47.45, 6: 154.06, 7: 255.28, 8: 421.89, 9: 651.21, 10: 205.7, 11: 255.28, 12: 651.21, 13: 668.27, 14: 109.28, 15: 152.58, 16: 47.45, 17: 47.45, 18: 81.73, 19: 152.58}, 'Unit Cost': {0: 524.96, 1: 31.79, 2: 90.93, 3: 502.54, 4: 31.79, 5: 31.79, 6: 90.93, 7: 159.42, 8: 364.69, 9: 524.96, 10: 117.11, 11: 159.42, 12: 524.96, 13: 502.54, 14: 35.84, 15: 97.44, 16: 31.79, 17: 31.79, 18: 56.67, 19: 97.44}, 'Total Revenue': {0: 2920025.64, 1: 51008.75, 2: 1003700.9, 3: 5134318.41, 4: 165647.95, 5: 468806.0, 6: 743339.5, 7: 850082.4, 8: 1025614.59, 9: 4035548.37, 10: 148926.8, 11: 2334535.6, 12: 4309707.78, 13: 3567225.26, 14: 1041110.56, 15: 67287.78, 16: 64769.25, 17: 124176.65, 18: 534922.85, 19: 386027.4}, 'Total Cost': {0: 2353920.64, 1: 34174.25, 2: 592408.95, 3: 3861014.82, 4: 110978.89, 5: 314085.2, 6: 438737.25, 7: 530868.6, 8: 886561.39, 9: 3253177.12, 10: 84787.64, 11: 1457895.9, 12: 3474185.28, 13: 2682558.52, 14: 341447.68, 15: 42971.04, 16: 43393.35, 17: 83194.43, 18: 370905.15, 19: 246523.2}, 'Total Profit': {0: 566105.0, 1: 16834.5, 2: 411291.95, 3: 1273303.59, 4: 54669.06, 5: 154720.8, 6: 304602.25, 7: 319213.8, 8: 139053.2, 9: 782371.25, 10: 64139.16, 11: 876639.7, 12: 835522.5, 13: 884666.74, 14: 699662.88, 15: 24316.74, 16: 21375.9, 17: 40982.22, 18: 164017.7, 19: 139504.2}}
8
  • please share some reproducible example. Commented Aug 1, 2022 at 9:21
  • @SalvatoreDanieleBianco id don't know how since it depends on the source file but for me fltr = tr.loc[(tr["Unit Cost"] > 500)] print(fltr) returns a dataframe with all the ligns where Unit Cost is bigger than 500. my goal is to make this dynamic where the column name,the filter type,the value to compare to are all submited by the user and not hard coded like here. i hope i explained well and thank you. Commented Aug 1, 2022 at 9:30
  • please, after reading it, print(tr.head(20).to_dict()) and attach the result to your question, in order to make sure that other users can replicate your data :) Commented Aug 1, 2022 at 9:46
  • 1
    @SalvatoreDanieleBianco this is the problem they are not working but here is an example: fltr = tr.loc[(tr["Unit Cost"] > 500)] this line will work if you try it, and my goal is to make "Unit Cost" as arg2[0] , > as arg2[1] , 500 as arg2[2] and make it repeate with "&" to add another condition Commented Aug 1, 2022 at 10:04
  • 1
    @SalvatoreDanieleBianco print(arg2) returns the list of arguments like this: ('Unit Cost', '>', '4') Commented Aug 1, 2022 at 13:40

3 Answers 3

2

How about this ?

def filter(df, **args):
    conditions = args["args"]
    
    for key , value in conditions.items():
        df = df[df[key] > value]
        
    return df
    

Invoke using

df = filter(df, args={"Unit Cost": 500, "Unit Price": 500})

Result:

print(df.shape)
(5,14)

Note: This approach can be used only when you want to compare all the conditions using >. if you need to include multiple operation, you may need to find a better approach

Sign up to request clarification or add additional context in comments.

Comments

1

Just use eval() and here are the code:

import pandas as pd

def filter_df(df, args_list):
    constraints = []
    for a in args_list:
        col = a[0]
        symbol = a[1]
        value = a[2]
        constraint = "(df.{}{}{})".format(col, symbol, value)
        constraints.append(constraint)
    
    filter_str = "&".join(constraints)

    return df[eval(filter_str)]

data = {
    "COL_A": [1,2,3,2,4,6],
    "COL_B": [1,10,100,20,20,40],
    "COL_C": ["aaa", "bbb", "zzz", "xxx", "xxx", "xxx"]
}
df = pd.DataFrame(data)

args_list = [["COL_A", "<=", "4"], ["COL_C", "==", "'xxx'"]]

df2 = filter_df(df, args_list)

This is df:

enter image description here

After filter COL_A <= 4 & COL_C == 'xxx', this is df2:

enter image description here

1 Comment

Thank you after some iterations i got it working for my use case: tr=pd.read_csv("sales.csv") def filter_df(df, args_list): constraints = [] for a in args_list: col = a[0] symbol = a[1] value = a[2] constraint = "(df['{}']{}{})".format(col, symbol, value) constraints.append(constraint) filter_str = "&".join(constraints) return df[eval(filter_str)] args_list = list(mit.split_at(sys.argv[1:], pred=lambda x: set(x) & {"&"})) df2 = filter_df(tr, args_list) print(df2)
0
def filter_df(arg2):
    if arg2[1]==">":
        return tr.loc[(tr[arg2[0]] > int(arg2[2]))]
    elif arg2[1]=="<":
        return tr.loc[(tr[arg2[0]] < int(arg2[2]))]
    elif arg2[1]=="=":
        return tr.loc[(tr[arg2[0]] == int(arg2[2]))]
    else:
        raise ValueError("invalid comparison: %s"%arg2[1])

filter_df(arg2)

now if (for example) arg2 = ('Unit Cost', '>', '500'), the function will return only the rows with Unit Cost>500:

enter image description here

If you want to pass multiple condition it is more complicated and my hint is to pass them step-by-step, separately.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.