Adding parameters to an applied dataframe function

Question

Lets say I have a dataframe:

                 Pop_By_Area    CensusPop
 ID         
 100010401001000    77.0        77           
 100010401001001    294.0       294 
 100010401001002    20.0        20
 100010401001003    91.0        91  
 100010401001004    53.0        53

I want to create a function that compares 2 column values on a row and return a value for a new column thats the difference between the 2 columns:

 def pop_compare(row):
     pop_by_area_sum = row.Pop_By_Area
    census_pop_avg = float(row.CensusPop)
    diff = 0
    if (pop_by_area_sum != census_pop_avg):
        diff = abs(int(pop_by_area_sum - census_pop_avg))
    return diff

cb_pop_sum['Difference'] = cb_pop_sum.apply(pop_compare, axis=1)

No problem; works fine but I have to use the specific column name:

>                   Pop_By_Area CensusPop Difference   
 ID         
 100010401001000    77.0        77        0   
 100010401001001    294.0       294       0
 100010401001002    20.0        20        0
 100010401001003    91.0        91        0
 100010401001004    53.0        53        0

Now, suppose I want to use a similar function to compare any 2 columns in a larger data frame to return the difference. I'd need to add parameters for the comparison columns to the function in addition to row.

def pop_compare2(row, colA, colB):
    valA = row.colA
    valB = row.colB
    diff = 0
    if (valA != valB):
        diff = abs(int(valA - valB))
    return diff

This doesn't work, when I run the following:

c_A = "Pop_By_Area"
c_B = "CensusPop"
cb_pop_sum['Difference2'] = cb_pop_sum.apply(pop_compare2(colA=c_A, colB=c_B), axis=1)
cb_pop_sum.head()

It throws the error TypeError: pop_compare2() missing 1 required positional argument: 'row'. What am I doing wrong here?

SultanOrazbayev · Accepted Answer · 2021-03-02 19:27:55Z

1

Maybe I misunderstood your question, but this should work:

from io import StringIO
csv = StringIO("""
 ID                 Pop_By_Area    CensusPop      
 100010401001000    77.0        77           
 100010401001001    294.0       294 
 100010401001002    20.0        20
 100010401001003    91.0        91  
 100010401001004    53.0        53 
""")

import pandas as pd
df = pd.read_csv(csv, sep='\s+')
df['Difference'] = df['Pop_By_Area'] - df['CensusPop']

def custom_func(subdf):
    x,y = subdf.values
    return x**3-y/123

df['Difference2'] = df[['Pop_By_Area', 'CensusPop']].apply(custom_func, axis=1)

answered Mar 2, 2021 at 19:27

SultanOrazbayev

16.7k3 gold badges25 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Adding parameters to an applied dataframe function

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related