Lets say I have a dataframe:
Pop_By_Area CensusPop
ID
100010401001000 77.0 77
100010401001001 294.0 294
100010401001002 20.0 20
100010401001003 91.0 91
100010401001004 53.0 53
I want to create a function that compares 2 column values on a row and return a value for a new column thats the difference between the 2 columns:
def pop_compare(row):
pop_by_area_sum = row.Pop_By_Area
census_pop_avg = float(row.CensusPop)
diff = 0
if (pop_by_area_sum != census_pop_avg):
diff = abs(int(pop_by_area_sum - census_pop_avg))
return diff
cb_pop_sum['Difference'] = cb_pop_sum.apply(pop_compare, axis=1)
No problem; works fine but I have to use the specific column name:
> Pop_By_Area CensusPop Difference
ID
100010401001000 77.0 77 0
100010401001001 294.0 294 0
100010401001002 20.0 20 0
100010401001003 91.0 91 0
100010401001004 53.0 53 0
Now, suppose I want to use a similar function to compare any 2 columns in a larger data frame to return the difference. I'd need to add parameters for the comparison columns to the function in addition to row.
def pop_compare2(row, colA, colB):
valA = row.colA
valB = row.colB
diff = 0
if (valA != valB):
diff = abs(int(valA - valB))
return diff
This doesn't work, when I run the following:
c_A = "Pop_By_Area"
c_B = "CensusPop"
cb_pop_sum['Difference2'] = cb_pop_sum.apply(pop_compare2(colA=c_A, colB=c_B), axis=1)
cb_pop_sum.head()
It throws the error TypeError: pop_compare2() missing 1 required positional argument: 'row'. What am I doing wrong here?