1

How to add/insert output of a function call that returns multiple fields, as new columns into Pandas dataframe ?

Sample code & data:

from pandas import DataFrame
People_List = [['Jon','Smith',21],['Mark','Brown',38],['Maria','Lee',42],['Jill','Jones',28],['Jack','Ford',55]]
df = DataFrame (People_List,columns=['First_Name','Last_Name','Age'])
print (df)


  First_Name Last_Name  Age
0        Jon     Smith   21
1       Mark     Brown   38
2      Maria       Lee   42
3       Jill     Jones   28
4       Jack      Ford   55


def getTitleBirthYear(df):
    if 'Maria' in df.First_Name:
        title='Ms'
    else:
        title='Mr' 
    current_year = int('2020')
    birth_year=''
    age = df.Age
    birth_year = current_year - age
    return title,birth_year

getTitleBirthYear(df)

  title birth_year
0 Mr    1999
1 Mr    1982
2 Ms    1978
3 Mr    1992
4 Mr    1965

final expected output:

  First_Name Last_Name  Age title   birth_year
0        Jon     Smith   21 Mr      1999
1       Mark     Brown   38 Mr      1982
2      Maria       Lee   42 Ms      1978
3       Jill     Jones   28 Mr      1992
4       Jack      Ford   55 Mr      1965

Please suggest. Thanks!

1
  • do you need a function for that logic? You could do it faster in pandas Commented Dec 2, 2020 at 15:21

2 Answers 2

2

Although you can apply, best is to use vectorized functions (see When should I (not) want to use pandas apply() in my code?). Your logic can be simplified as below:

print (df.assign(title=np.where(df["First_Name"].eq("Maria"), "Ms", "Mr"),
                 birth_year=pd.Timestamp.now().year-df["Age"])) # or 2020-df["Age"]

  First_Name Last_Name  Age title  birth_year
0        Jon     Smith   21    Mr        1999
1       Mark     Brown   38    Mr        1982
2      Maria       Lee   42    Ms        1978
3       Jill     Jones   28    Mr        1992
4       Jack      Ford   55    Mr        1965
Sign up to request clarification or add additional context in comments.

Comments

1

Here are two ways, apply and create the new columns

df[['title', 'birth_year']] = pd.DataFrame(df.apply(getTitleBirthYear, axis=1).tolist())

df[['title', 'birth_year']] = df.apply(getTitleBirthYear, axis=1, result_type='expand')

  First_Name Last_Name  Age title  birth_year
0        Jon     Smith   21    Mr        1999
1       Mark     Brown   38    Mr        1982
2      Maria       Lee   42    Ms        1978
3       Jill     Jones   28    Mr        1992
4       Jack      Ford   55    Mr        1965

4 Comments

your welcome! does that completely answer your question?
To take this question further:- what if my sample function takes two diff data-frames as arguments like for example - getTitleBirthYear(df1, df2) ? Can you please help how to use apply() in such case ? Same statement above with 2 args gives error as:- getTitleBirthYear() missing 1 required positional argument: 'df2'", 'occurred at index 0'
this sounds like a more involved question. Can you create a new post with that question. Also will you need the output appended to df1/2. Which df is the apply placed on?
posted a follow up question here:- link

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.