Implementing functions with dataframes in python

Question

I have this problem where I am stuck for quite a number of days.

I have this function :

def cal_score(research, citations, teaching, international, income):
     return .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income

where “research”, “citations”, “teaching”, “international” and “income” are columns of the dataset. I want to add a new column in the dataset whose values should be calculated based on the function mentioned above. I tried different procedures but none worked.

Example : If we have a row as below

university_name  Indian Institute of Technology Bombay


teaching  43.8

international  14.3

research  24.2

citations  8,327

income   14.9

Total Score Ranking

Then the total score should be calculated as

Total Score =  .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income.

This should apply for all the rows in the dataset.

Can anyone please help me in implementing this requirement. I am stuck at this for quite sometime now. :-(

Indian_univ.head(10).to_dict()

{'citations': {510: 38.799999999999997,
  832: 39.0,
  856: 45.600000000000001,
  959: 45.799999999999997,
  1232: 84.700000000000003,
  1360: 38.5,
  1361: 41.799999999999997,
  1362: 35.299999999999997,
  1363: 53.600000000000001,
  1679: 51.600000000000001},
 'country': {510: 'India',
  832: 'India',
  856: 'India',
  959: 'India',
  1232: 'India',
  1360: 'India',
  1361: 'India',
  1362: 'India',
  1363: 'India',
  1679: 'India'},
 'female_male_ratio': {510: '16 : 84',
  832: '15 : 85',
  856: '16 : 84',
  959: '17 : 83',
  1232: '46 : 54',
  1360: '18 : 82',
  1361: '13 : 87',
  1362: '15 : 85',
  1363: '17 : 83',
  1679: '19 : 81'},
 'income': {510: '24.2',
  832: '72.4',
  856: '52.7',
  959: '70.4',
  1232: '28.4',
  1360: '-',
  1361: '42.4',
  1362: '-',
  1363: '64.8',
  1679: '37.9'},
 'international': {510: '14.3',
  832: '16.1',
  856: '19.9',
  959: '15.6',
  1232: '29.3',
  1360: '15.3',
  1361: '17.3',
  1362: '14.7',
  1363: '15.6',
  1679: '18.2'},
 'international_students': {510: '1%',
  832: '0%',
  856: '1%',
  959: '1%',
  1232: '1%',
  1360: '1%',
  1361: '0%',
  1362: '0%',
  1363: '1%',
  1679: '1%'},
 'num_students': {510: '8,327',
  832: '9,928',
  856: '8,327',
  959: '8,061',
  1232: '16,691',
  1360: '8,371',
  1361: '6,167',
  1362: '9,928',
  1363: '8,061',
  1679: '3,318'},
 'research': {510: 15.699999999999999,
  832: 45.299999999999997,
  856: 33.100000000000001,
  959: 13.699999999999999,
  1232: 14.0,
  1360: 23.0,
  1361: 25.199999999999999,
  1362: 30.0,
  1363: 12.300000000000001,
  1679: 39.5},
 'student_staff_ratio': {510: 14.9,
  832: 17.5,
  856: 14.9,
  959: 18.699999999999999,
  1232: 23.899999999999999,
  1360: 17.300000000000001,
  1361: 12.199999999999999,
  1362: 17.5,
  1363: 18.699999999999999,
  1679: 8.1999999999999993},
 'teaching': {510: 43.799999999999997,
  832: 44.200000000000003,
  856: 47.299999999999997,
  959: 30.399999999999999,
  1232: 25.800000000000001,
  1360: 33.799999999999997,
  1361: 31.300000000000001,
  1362: 39.299999999999997,
  1363: 25.100000000000001,
  1679: 32.600000000000001},
 'total_score': {510: 29.489999999999995,
  832: 38.549999999999997,
  856: 37.799999999999997,
  959: 26.969999999999999,
  1232: 37.350000000000001,
  1360: 28.589999999999996,
  1361: 29.489999999999998,
  1362: 31.379999999999995,
  1363: 27.299999999999997,
  1679: 37.109999999999999},
 'university_name': {510: 'Indian Institute of Technology Bombay',
  832: 'Indian Institute of Technology Kharagpur',
  856: 'Indian Institute of Technology Bombay',
  959: 'Indian Institute of Technology Roorkee',
  1232: 'Panjab University',
  1360: 'Indian Institute of Technology Delhi',
  1361: 'Indian Institute of Technology Kanpur',
  1362: 'Indian Institute of Technology Kharagpur',
  1363: 'Indian Institute of Technology Roorkee',
  1679: 'Indian Institute of Science'},
 'world_rank': {510: '301-350',
  832: '226-250',
  856: '251-275',
  959: '351-400',
  1232: '226-250',
  1360: '351-400',
  1361: '351-400',
  1362: '351-400',
  1363: '351-400',
  1679: '276-300'},
 'year': {510: 2012,
  832: 2013,
  856: 2013,
  959: 2013,
  1232: 2014,
  1360: 2014,
  1361: 2014,
  1362: 2014,
  1363: 2014,
  1679: 2015}}

Please post your actual DataFrame. It is often helpful to also post df.head(20).to_dict() so people can play around with your data. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Aug 23, 2016 at 4:58
Hi. I have added a screenshot of the data. Kindly have a look. — scooby
– scooby, Commented Aug 23, 2016 at 7:02
Do not post a picture. Post the output of df.head().to_dict() — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Aug 23, 2016 at 7:07
Edited the post, but its looking clumsy. :( Update : while teaching, research, citations are float64, total_score, international, and income are object. I am able to calculate the score based on only the fields with dtype float64. which means i need to convert the remaining required fields from object to float64, which should solve the problem — scooby
– scooby, Commented Aug 23, 2016 at 7:12
I made it look nicer :). Yes, you need the columns to have numeric dtype, or else you won't be able to use them for computations! That should be straightforward enough. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Aug 23, 2016 at 7:20

jezrael · Accepted Answer · 2016-08-23 07:30:39Z

I think you can use:

df['Total Score'] = .3 **df.research + 
                    .3 **df.citations + 
                    .3 **df.teaching + 
                    .075 **df.international + 
                    .025 **df.income

If need apply function, what is very often slowier:

def cal_score(x):
     return .3 **x.research + 
            .3 **x.citations + 
            .3 **x.teaching +
            .075 **x.international + 
            .025 **x.income

df['Total Score'] = df.apply(cal_score, axis=1)

EDIT with data:

You need first replace columns num_students and income and then convert to float by astype:

EDIT2 by sample of data:

import pandas as pd

df = pd.DataFrame({'citations': {510: 38.799999999999997, 832: 39.0, 856: 45.600000000000001, 959: 45.799999999999997, 1232: 84.700000000000003, 1360: 38.5, 1361: 41.799999999999997, 1362: 35.299999999999997, 1363: 53.600000000000001, 1679: 51.600000000000001}, 'country': {510: 'India', 832: 'India', 856: 'India', 959: 'India', 1232: 'India', 1360: 'India', 1361: 'India', 1362: 'India', 1363: 'India', 1679: 'India'}, 'female_male_ratio': {510: '16 : 84', 832: '15 : 85', 856: '16 : 84', 959: '17 : 83', 1232: '46 : 54', 1360: '18 : 82', 1361: '13 : 87', 1362: '15 : 85', 1363: '17 : 83', 1679: '19 : 81'}, 'income': {510: '24.2', 832: '72.4', 856: '52.7', 959: '70.4', 1232: '28.4', 1360: '-', 1361: '42.4', 1362: '-', 1363: '64.8', 1679: '37.9'}, 'international': {510: '14.3', 832: '16.1', 856: '19.9', 959: '15.6', 1232: '29.3', 1360: '15.3', 1361: '17.3', 1362: '14.7', 1363: '15.6', 1679: '18.2'}, 'international_students': {510: '1%', 832: '0%', 856: '1%', 959: '1%', 1232: '1%', 1360: '1%', 1361: '0%', 1362: '0%', 1363: '1%', 1679: '1%'}, 'num_students': {510: '8,327', 832: '9,928', 856: '8,327', 959: '8,061', 1232: '16,691', 1360: '8,371', 1361: '6,167', 1362: '9,928', 1363: '8,061', 1679: '3,318'}, 'research': {510: 15.699999999999999, 832: 45.299999999999997, 856: 33.100000000000001, 959: 13.699999999999999, 1232: 14.0, 1360: 23.0, 1361: 25.199999999999999, 1362: 30.0, 1363: 12.300000000000001, 1679: 39.5}, 'student_staff_ratio': {510: 14.9, 832: 17.5, 856: 14.9, 959: 18.699999999999999, 1232: 23.899999999999999, 1360: 17.300000000000001, 1361: 12.199999999999999, 1362: 17.5, 1363: 18.699999999999999, 1679: 8.1999999999999993}, 'teaching': {510: 43.799999999999997, 832: 44.200000000000003, 856: 47.299999999999997, 959: 30.399999999999999, 1232: 25.800000000000001, 1360: 33.799999999999997, 1361: 31.300000000000001, 1362: 39.299999999999997, 1363: 25.100000000000001, 1679: 32.600000000000001}, 'total_score': {510: 29.489999999999995, 832: 38.549999999999997, 856: 37.799999999999997, 959: 26.969999999999999, 1232: 37.350000000000001, 1360: 28.589999999999996, 1361: 29.489999999999998, 1362: 31.379999999999995, 1363: 27.299999999999997, 1679: 37.109999999999999}, 'university_name': {510: 'Indian Institute of Technology Bombay', 832: 'Indian Institute of Technology Kharagpur', 856: 'Indian Institute of Technology Bombay', 959: 'Indian Institute of Technology Roorkee', 1232: 'Panjab University', 1360: 'Indian Institute of Technology Delhi', 1361: 'Indian Institute of Technology Kanpur', 1362: 'Indian Institute of Technology Kharagpur', 1363: 'Indian Institute of Technology Roorkee', 1679: 'Indian Institute of Science'}, 'world_rank': {510: '301-350', 832: '226-250', 856: '251-275', 959: '351-400', 1232: '226-250', 1360: '351-400', 1361: '351-400', 1362: '351-400', 1363: '351-400', 1679: '276-300'}, 'year': {510: 2012, 832: 2013, 856: 2013, 959: 2013, 1232: 2014, 1360: 2014, 1361: 2014, 1362: 2014, 1363: 2014, 1679: 2015}})

#replace , to empty string
df['num_students'] = df.num_students.str.replace(',', '')
#replace - to '0'
df['income'] = df['income'].str.replace('-', '0')

#convert columns to float
df[['teaching', 'international', 'research', 'citations', 'income']] = 
df[['teaching', 'international', 'research', 'citations', 'income']].astype(float)

df['Total Score'] = .3 **df.research + 
                    .3 **df.citations +  
                    .3 **df.teaching +  
                    .075 **df.international +  
                    .025 **df.income

print (df)

      citations country female_male_ratio  income  international  \
510        38.8   India           16 : 84    24.2           14.3   
832        39.0   India           15 : 85    72.4           16.1   
856        45.6   India           16 : 84    52.7           19.9   
959        45.8   India           17 : 83    70.4           15.6   
1232       84.7   India           46 : 54    28.4           29.3   
1360       38.5   India           18 : 82     0.0           15.3   
1361       41.8   India           13 : 87    42.4           17.3   
1362       35.3   India           15 : 85     0.0           14.7   
1363       53.6   India           17 : 83    64.8           15.6   
1679       51.6   India           19 : 81    37.9           18.2   

     international_students num_students  research  student_staff_ratio  \
510                      1%         8327      15.7                 14.9   
832                      0%         9928      45.3                 17.5   
856                      1%         8327      33.1                 14.9   
959                      1%         8061      13.7                 18.7   
1232                     1%        16691      14.0                 23.9   
1360                     1%         8371      23.0                 17.3   
1361                     0%         6167      25.2                 12.2   
1362                     0%         9928      30.0                 17.5   
1363                     1%         8061      12.3                 18.7   
1679                     1%         3318      39.5                  8.2   

      teaching  total_score                           university_name  \
510       43.8        29.49     Indian Institute of Technology Bombay   
832       44.2        38.55  Indian Institute of Technology Kharagpur   
856       47.3        37.80     Indian Institute of Technology Bombay   
959       30.4        26.97    Indian Institute of Technology Roorkee   
1232      25.8        37.35                         Panjab University   
1360      33.8        28.59      Indian Institute of Technology Delhi   
1361      31.3        29.49     Indian Institute of Technology Kanpur   
1362      39.3        31.38  Indian Institute of Technology Kharagpur   
1363      25.1        27.30    Indian Institute of Technology Roorkee   
1679      32.6        37.11               Indian Institute of Science   

     world_rank  year   Total Score  
510     301-350  2012  6.177371e-09  
832     226-250  2013  7.776087e-19  
856     251-275  2013  4.928529e-18  
959     351-400  2013  6.863746e-08  
1232    226-250  2014  4.782972e-08  
1360    351-400  2014  1.000000e+00  
1361    351-400  2014  6.664022e-14  
1362    351-400  2014  1.000000e+00  
1363    351-400  2014  3.703322e-07  
1679    276-300  2015  9.003721e-18

Hi jezrael, Thanks for your response, I followed your approach, but its giving me the following error. TypeError: unsupported operand type(s) for ** or pow(): 'float' and 'str'. it gives the error in the following line . df['Total Score'] = .3 **df.research + .3 **df.citations + .3 **df.teaching + .075 **df.international + .025 **df.income
What is df.dtypes before df['Total Score'] = ... ? Because it looks same columns are not float, but string. But it is not problem, I can help you.
ok... I see some inconsistencies, . while teaching, research, citations are float64, total_score, international, and income are object.
Update : I have added a screenshot of the data, Also , I am able to calculate the score based on only the fields with dtype float64. which means i need to convert the remaining required fields from object to float64, which should solve the problem.
I have one problem - in column income are values -, but you need float. Is possible convert - to 0 ?

juanpa.arrivillaga · Accepted Answer · 2016-08-23 05:03:35Z

1

Here is the most straightforward way:

df.assign(TotalScore=.3 **df.research + .3 **df.citations + .3 **df.teaching +.075 **df.international + .025 **df.income)

answered Aug 23, 2016 at 5:03

juanpa.arrivillaga

97.6k14 gold badges141 silver badges190 bronze badges

Collectives™ on Stack Overflow

Implementing functions with dataframes in python

2 Answers 2

8 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related