0

this is continuation to below post:

How to add/insert output of a function call that returns multiple fields, as new columns into Pandas dataframe?

If a function returns multiple fields from two different arguments, how to use apply() or add them altogether in a new pandas dataframe ?

Sample code:

    from pandas import DataFrame
    People_List = [['Jon','Smith',21],['Mark','Brown',38],['Maria','Lee',42],['Jill','Jones',28],['Jack','Ford',55]]
    df1 = DataFrame (People_List,columns=['First_Name','Last_Name','Age'])
    Address_List = [['Jon','Chicago'],['Mark','SFO'],['Maria','Chicago'],['Jill','Chicago'],['Jack','Chicago']]
    df2 = DataFrame(Address_List,columns=['First_Name', 'City'])
    print (df1, df2)
      First_Name Last_Name  Age
    0        Jon     Smith   21
    1       Mark     Brown   38
    2      Maria       Lee   42
    3       Jill     Jones   28
    4       Jack      Ford   55  
 
      First_Name     City
    0        Jon  Chicago
    1       Mark      SFO
    2      Maria  Chicago
    3       Jill  Chicago
    4       Jack  Chicago
def getTitleBirthYear(df1, df2):
    if 'Maria' in df1.First_Name:
        title='Ms'
    else:
        title='Mr' 
    current_year = int('2020')
    birth_year=''
    age = df1.Age
    birth_year = current_year - age
    if 'Chicago' in df2.City:
        state='IL'
    else:
        state='Other'
    return title,birth_year,state
    #return {'title':title,'birth_year':birth_year, 'state':state}

getTitleBirthYear(df1,df2)

  title birth_year state
0 Mr    1999       IL
1 Mr    1982       Other
2 Ms    1978       IL
3 Mr    1992       IL
4 Mr    1965       IL

df = DataFrame.merge(df1,df2,on='First_Name',how='inner')
print(df)
      First_Name Last_Name  Age     City
    0        Jon     Smith   21  Chicago
    1       Mark     Brown   38      SFO
    2      Maria       Lee   42  Chicago
    3       Jill     Jones   28  Chicago
    4       Jack      Ford   55  Chicago
df['title', 'birth_year', 'state'] = pd.DataFrame(df.apply(getTitleBirthYear,axis=1).tolist())

However, getting below error: TypeError: ("getTitleBirthYear() missing 1 required positional argument: 'df2'", 'occurred at index 0')

final expected output:

  First_Name Last_Name  Age City    title   birth_year  state
0        Jon     Smith   21 Chicago     Mr      1999        IL
1       Mark     Brown   38 SFO     Mr      1982        Other
2      Maria       Lee   42 Chicago     Ms      1978        IL  
3       Jill     Jones   28 Chicago     Mr      1992        IL
4       Jack      Ford   55 Chicago     Mr      1965        IL

2 Answers 2

1

I think you need numpy.where with Series.rsub for subtract from right side instead your function:

import numpy as np

df = df1.merge(df2,on='First_Name')
df['title'] = np.where(df['First_Name'].eq('Maria'), 'Ms', 'Mr')
df['birth_year'] = df['Age'].rsub(2020)
df['state'] = np.where(df['City'].eq('Chicago'), 'IL', 'Other')
print (df)
  First_Name Last_Name  Age     City title  birth_year  state
0        Jon     Smith   21  Chicago    Mr        1999     IL
1       Mark     Brown   38      SFO    Mr        1982  Other
2      Maria       Lee   42  Chicago    Ms        1978     IL
3       Jill     Jones   28  Chicago    Mr        1992     IL
4       Jack      Ford   55  Chicago    Mr        1965     IL

Your method should be changed with result_type='expand' in DataFrame.apply, assigned columns to list ['title', 'birth_year', 'state'] (added []), changed function for check by == instead in.

But solution is slowier/ complicated, so better is use first one.

def getTitleBirthYear(x):
    if x.First_Name == 'Maria' :
        title='Ms'
    else:
        title='Mr' 
    current_year = int('2020')
    birth_year=''
    age = x.Age
    birth_year = current_year - age
    if x.City == 'Chicago':
        state='IL'
    else:
        state='Other'
    return title,birth_year,state

    
df = df1.merge(df2,on='First_Name')
    
df[['title', 'birth_year', 'state']] = df.apply(getTitleBirthYear,
                                                axis=1, 
                                                result_type='expand')
print (df)
  First_Name Last_Name  Age     City title  birth_year  state
0        Jon     Smith   21  Chicago    Mr        1999     IL
1       Mark     Brown   38      SFO    Mr        1982  Other
2      Maria       Lee   42  Chicago    Ms        1978     IL
3       Jill     Jones   28  Chicago    Mr        1992     IL
4       Jack      Ford   55  Chicago    Mr        1965     IL
Sign up to request clarification or add additional context in comments.

Comments

0

You function doesn't need two input args if you can merge the df before hand

df = df1.merge(df2,on='First_Name')

def getTitleBirthYear(x):
    if x.First_Name == 'Maria' :
        title='Ms'
    else:
        title='Mr' 
    current_year = int('2020')
    birth_year=''
    age = x.Age
    birth_year = current_year - age
    if x.City == 'Chicago':
        state='IL'
    else:
        state='Other'
    return title,birth_year,state

However as stated by @jezrael, this approach is much slower, read more here

df[['title', 'birth_year', 'state']] = df.apply(getTitleBirthYear, axis=1, result_type='expand'

  First_Name Last_Name  Age     City title  birth_year  state
0        Jon     Smith   21  Chicago    Mr        1999     IL
1       Mark     Brown   38      SFO    Mr        1982  Other
2      Maria       Lee   42  Chicago    Ms        1978     IL
3       Jill     Jones   28  Chicago    Mr        1992     IL
4       Jack      Ford   55  Chicago    Mr        1965     IL

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.