0

Lets say I have a dataframe:

                 Pop_By_Area    CensusPop
 ID         
 100010401001000    77.0        77           
 100010401001001    294.0       294 
 100010401001002    20.0        20
 100010401001003    91.0        91  
 100010401001004    53.0        53  

I want to create a function that compares 2 column values on a row and return a value for a new column thats the difference between the 2 columns:

 def pop_compare(row):
     pop_by_area_sum = row.Pop_By_Area
    census_pop_avg = float(row.CensusPop)
    diff = 0
    if (pop_by_area_sum != census_pop_avg):
        diff = abs(int(pop_by_area_sum - census_pop_avg))
    return diff

cb_pop_sum['Difference'] = cb_pop_sum.apply(pop_compare, axis=1)

No problem; works fine but I have to use the specific column name:

>                   Pop_By_Area CensusPop Difference   
 ID         
 100010401001000    77.0        77        0   
 100010401001001    294.0       294       0
 100010401001002    20.0        20        0
 100010401001003    91.0        91        0
 100010401001004    53.0        53        0

Now, suppose I want to use a similar function to compare any 2 columns in a larger data frame to return the difference. I'd need to add parameters for the comparison columns to the function in addition to row.

def pop_compare2(row, colA, colB):
    valA = row.colA
    valB = row.colB
    diff = 0
    if (valA != valB):
        diff = abs(int(valA - valB))
    return diff

This doesn't work, when I run the following:

c_A = "Pop_By_Area"
c_B = "CensusPop"
cb_pop_sum['Difference2'] = cb_pop_sum.apply(pop_compare2(colA=c_A, colB=c_B), axis=1)
cb_pop_sum.head()

It throws the error TypeError: pop_compare2() missing 1 required positional argument: 'row'. What am I doing wrong here?

1 Answer 1

1

Maybe I misunderstood your question, but this should work:

from io import StringIO
csv = StringIO("""
 ID                 Pop_By_Area    CensusPop      
 100010401001000    77.0        77           
 100010401001001    294.0       294 
 100010401001002    20.0        20
 100010401001003    91.0        91  
 100010401001004    53.0        53 
""")

import pandas as pd
df = pd.read_csv(csv, sep='\s+')
df['Difference'] = df['Pop_By_Area'] - df['CensusPop']

def custom_func(subdf):
    x,y = subdf.values
    return x**3-y/123

df['Difference2'] = df[['Pop_By_Area', 'CensusPop']].apply(custom_func, axis=1)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.