0

I have a problem where I have a bunch of people whose summary has been aggregated to form a table/pd DataFrame. Scores need to be computed for such people depending on their profession and group. Each profession can have different number of groups and is dynamic.

There are 3 pd DFs-

# DF containing summary aggregation for different people-
summary_data = pd.read_csv("Sample_Summary.csv")

# DF containing threshold according to which scores need to be computed-
threshold_data = pd.read_csv("Sample_Threshold.csv")

# DF containing weightage for different columns-
weightage_data = pd.read_csv("Sample_Weightage.csv")

summary_data.shape, threshold_data.shape
# ((12, 6), (8, 6))

weightage_data.shape
# (8, 6)

The flow for the score computation is as follows: for each person in 'summary_data', do: iterate through all columns in 'summary_data' and check the corresponding columns in 'threshold_data' whether the value is greater than or equal to the value in 'threshold_data'. If yes, add the score for that violated column in 'weightage_data'.

summary_data.loc[:, :]
'''
   Profession  Group  Salary  Number_Houses  Number_Cars  Number_Cellphones
0      Doctor      1  300000              2            1                  3
1      Doctor      2  150000              3            2                  1
2      Doctor      3  260000              2            2                  2
3      Doctor      1  120000              1            3                  3
4    Engineer      1   85000              1            1                  1
5    Engineer      2  125000              2            3                  4
6    Engineer      2   60000              3            0                  1
7    Engineer      1   90000              3            3                  3
8    Business      1  266000              2            3                  5
9    Business      2  301000              3            5                  8
10   Business      3  385000              4            1                  1
11   Business      1  183000              2            3                  4
'''

threshold_data.loc[:, :]
'''
  Profession  Group  Salary  Number_Houses  Number_Cars  Number_Cellphones
0     Doctor      1  180000              2            3                  3
1     Doctor      2  120000              3            3                  2
2     Doctor      3  200000              2            3                  3
3   Engineer      1   90000              2            3                  4
4   Engineer      2  120000              3            2                  2
5   Business      1  200000              2            4                  4
6   Business      2  300000              3            6                  6
7   Business      3  400000              5            1                  1
'''

weightage_data.loc[:, :]
'''
  Profession  Group  Salary  Number_Houses  Number_Cars  Number_Cellphones
0     Doctor      1       4              6            5                  2
1     Doctor      2       4              5            5                  2
2     Doctor      3       4              5            5                  1
3   Engineer      1       3              5            3                  1
4   Engineer      2       4              5            3                  2
5   Business      1       5              5            3                  1
6   Business      2       5              5            4                  2
7   Business      3       5              6            4                  2
'''

For example, the two people belonging to 'Doctor' profession and Group = 1,

summary_data[(summary_data['Profession'] == 'Doctor') & (summary_data['Group'] == 1)]
'''
  Profession  Group  Salary  Number_Houses  Number_Cars  Number_Cellphones
0     Doctor      1  300000              2            1                  3
3     Doctor      1  120000              1            3                  3
'''

The corresponding 'threshold_data' & 'weightage_data' are:

threshold_data[(threshold_data['Profession'] == 'Doctor') & (threshold_data['Group'] == 1)]
'''
  Profession  Group  Salary  Number_Houses  Number_Cars  Number_Cellphones
0     Doctor      1  180000              2            3                  3
'''

weightage_data[(weightage_data['Profession'] == 'Doctor') & (weightage_data['Group'] == 1)]
'''
  Profession  Group  Salary  Number_Houses  Number_Cars  Number_Cellphones
0     Doctor      1       4              6            5                  2
'''

The first Doctor has Salary=300000 which is greater than the threshold value=180000, so it crosses the threshold. So, the corresponding weight of 4 (from 'weightage_data') is added, Number_of_houses=2 is greater than or equal to threshold value=2. so the weight of 6 is added to 4. Number_of_Cars=1 is less than threshold value=3, so its not considered. And finally, Number_of_Cellphones=3 is greater than or equal to threshold, so the weight=2 is added. The final score = 4+6+2 = 12.

Similarly, for second Doctor, the score = 5 ('Number_of_Cars' col is violated) + 2 ('Number_of_Cellphones' col is violated)= 7

How can I write code for this without using inefficient and non-pythonic for loops?

1 Answer 1

1

This should do the job without any loop:

# define grouping dolumns
grp_vars = ['Profession', 'Group']

# same structure to all data frames by merge
threshold_data_big = pd.merge(summary_data[grp_vars], threshold_data, how='left', on=grp_vars)
weightage_data_big = pd.merge(summary_data[grp_vars], weightage_data, how='left', on=grp_vars)

# compare threshold to acutal data
above_threshold = threshold_data_big.drop(grp_vars, axis=1) <= summary_data.drop(grp_vars, axis=1)

# select those weight values that are above the threshold and calcualte rowsum
summary_data['final_score'] = weightage_data_big.drop(grp_vars, axis=1)[above_threshold].sum(axis=1)
summary_data

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.