How to add new columns into a new dataframe using output of single function call having multiple arguments and multiple return fields?

Question

this is continuation to below post:

How to add/insert output of a function call that returns multiple fields, as new columns into Pandas dataframe?

If a function returns multiple fields from two different arguments, how to use apply() or add them altogether in a new pandas dataframe ?

Sample code:

    from pandas import DataFrame
    People_List = [['Jon','Smith',21],['Mark','Brown',38],['Maria','Lee',42],['Jill','Jones',28],['Jack','Ford',55]]
    df1 = DataFrame (People_List,columns=['First_Name','Last_Name','Age'])
    Address_List = [['Jon','Chicago'],['Mark','SFO'],['Maria','Chicago'],['Jill','Chicago'],['Jack','Chicago']]
    df2 = DataFrame(Address_List,columns=['First_Name', 'City'])
    print (df1, df2)

      First_Name Last_Name  Age
    0        Jon     Smith   21
    1       Mark     Brown   38
    2      Maria       Lee   42
    3       Jill     Jones   28
    4       Jack      Ford   55  
 
      First_Name     City
    0        Jon  Chicago
    1       Mark      SFO
    2      Maria  Chicago
    3       Jill  Chicago
    4       Jack  Chicago

def getTitleBirthYear(df1, df2):
    if 'Maria' in df1.First_Name:
        title='Ms'
    else:
        title='Mr' 
    current_year = int('2020')
    birth_year=''
    age = df1.Age
    birth_year = current_year - age
    if 'Chicago' in df2.City:
        state='IL'
    else:
        state='Other'
    return title,birth_year,state
    #return {'title':title,'birth_year':birth_year, 'state':state}

getTitleBirthYear(df1,df2)

  title birth_year state
0 Mr    1999       IL
1 Mr    1982       Other
2 Ms    1978       IL
3 Mr    1992       IL
4 Mr    1965       IL

df = DataFrame.merge(df1,df2,on='First_Name',how='inner')
print(df)

      First_Name Last_Name  Age     City
    0        Jon     Smith   21  Chicago
    1       Mark     Brown   38      SFO
    2      Maria       Lee   42  Chicago
    3       Jill     Jones   28  Chicago
    4       Jack      Ford   55  Chicago

df['title', 'birth_year', 'state'] = pd.DataFrame(df.apply(getTitleBirthYear,axis=1).tolist())

However, getting below error: TypeError: ("getTitleBirthYear() missing 1 required positional argument: 'df2'", 'occurred at index 0')

final expected output:

  First_Name Last_Name  Age City    title   birth_year  state
0        Jon     Smith   21 Chicago     Mr      1999        IL
1       Mark     Brown   38 SFO     Mr      1982        Other
2      Maria       Lee   42 Chicago     Ms      1978        IL  
3       Jill     Jones   28 Chicago     Mr      1992        IL
4       Jack      Ford   55 Chicago     Mr      1965        IL

jezrael · Accepted Answer · 2020-12-03 09:15:55Z

I think you need numpy.where with Series.rsub for subtract from right side instead your function:

import numpy as np

df = df1.merge(df2,on='First_Name')
df['title'] = np.where(df['First_Name'].eq('Maria'), 'Ms', 'Mr')
df['birth_year'] = df['Age'].rsub(2020)
df['state'] = np.where(df['City'].eq('Chicago'), 'IL', 'Other')
print (df)
  First_Name Last_Name  Age     City title  birth_year  state
0        Jon     Smith   21  Chicago    Mr        1999     IL
1       Mark     Brown   38      SFO    Mr        1982  Other
2      Maria       Lee   42  Chicago    Ms        1978     IL
3       Jill     Jones   28  Chicago    Mr        1992     IL
4       Jack      Ford   55  Chicago    Mr        1965     IL

Your method should be changed with result_type='expand' in DataFrame.apply, assigned columns to list ['title', 'birth_year', 'state'] (added []), changed function for check by == instead in.

But solution is slowier/ complicated, so better is use first one.

def getTitleBirthYear(x):
    if x.First_Name == 'Maria' :
        title='Ms'
    else:
        title='Mr' 
    current_year = int('2020')
    birth_year=''
    age = x.Age
    birth_year = current_year - age
    if x.City == 'Chicago':
        state='IL'
    else:
        state='Other'
    return title,birth_year,state

    
df = df1.merge(df2,on='First_Name')
    
df[['title', 'birth_year', 'state']] = df.apply(getTitleBirthYear,
                                                axis=1, 
                                                result_type='expand')
print (df)
  First_Name Last_Name  Age     City title  birth_year  state
0        Jon     Smith   21  Chicago    Mr        1999     IL
1       Mark     Brown   38      SFO    Mr        1982  Other
2      Maria       Lee   42  Chicago    Ms        1978     IL
3       Jill     Jones   28  Chicago    Mr        1992     IL
4       Jack      Ford   55  Chicago    Mr        1965     IL

Kenan · Accepted Answer · 2020-12-03 16:24:14Z

You function doesn't need two input args if you can merge the df before hand

df = df1.merge(df2,on='First_Name')

def getTitleBirthYear(x):
    if x.First_Name == 'Maria' :
        title='Ms'
    else:
        title='Mr' 
    current_year = int('2020')
    birth_year=''
    age = x.Age
    birth_year = current_year - age
    if x.City == 'Chicago':
        state='IL'
    else:
        state='Other'
    return title,birth_year,state

However as stated by @jezrael, this approach is much slower, read more here

df[['title', 'birth_year', 'state']] = df.apply(getTitleBirthYear, axis=1, result_type='expand'

  First_Name Last_Name  Age     City title  birth_year  state
0        Jon     Smith   21  Chicago    Mr        1999     IL
1       Mark     Brown   38      SFO    Mr        1982  Other
2      Maria       Lee   42  Chicago    Ms        1978     IL
3       Jill     Jones   28  Chicago    Mr        1992     IL
4       Jack      Ford   55  Chicago    Mr        1965     IL

Collectives™ on Stack Overflow

How to add new columns into a new dataframe using output of single function call having multiple arguments and multiple return fields?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related