I have dataset consists of categorical and numerical columns. For instance: salary dataset
columns: ['job', 'country_origin', 'age', 'salary', 'degree','marital_status']
four categorical columns and two numerical columns and I want to use three aggregate functions:
cat_col = ['job', 'country_origin','degree','marital_status']
num_col = [ 'age', 'salary']
aggregate_function = ['avg','max','sum']
Currently, I have my Python code that using raw query, while my objective is to get the group-by query results from all combinations from lists above:
my query: "SELECT cat_col[0], aggregate_function[0](num_col[0]) from DB where marital_status = 'married' groub by cat_col[0]"
So queries are:
q1 = select job, avg(age) from DB where marietal_status='married' groub by job
q2 = select job, avg(salary) from DB where marietal_status='married' groub by job
etc
I used for loop to get the result from all combinations.
My problem is, I want to change that query to Pandas query. I've spent a couple of hours but could not solve it.
Pandas has a different way to querying data.
Sample dataframe:
df2 = pd.DataFrame(np.array([['programmer', 'US', 28,4000, 'master','unmarried'],
['data scientist', 'UK', 30,5000, 'PhD','unmarried'],
['manager', 'US', 48,9000, 'master','married']]),
columns=[['job', 'country_origin', 'age', 'salary', 'degree','marital_status']])