How to add/insert output of a function call that returns multiple fields, as new columns into Pandas dataframe?

Question

How to add/insert output of a function call that returns multiple fields, as new columns into Pandas dataframe ?

Sample code & data:

from pandas import DataFrame
People_List = [['Jon','Smith',21],['Mark','Brown',38],['Maria','Lee',42],['Jill','Jones',28],['Jack','Ford',55]]
df = DataFrame (People_List,columns=['First_Name','Last_Name','Age'])
print (df)


  First_Name Last_Name  Age
0        Jon     Smith   21
1       Mark     Brown   38
2      Maria       Lee   42
3       Jill     Jones   28
4       Jack      Ford   55


def getTitleBirthYear(df):
    if 'Maria' in df.First_Name:
        title='Ms'
    else:
        title='Mr' 
    current_year = int('2020')
    birth_year=''
    age = df.Age
    birth_year = current_year - age
    return title,birth_year

getTitleBirthYear(df)

  title birth_year
0 Mr    1999
1 Mr    1982
2 Ms    1978
3 Mr    1992
4 Mr    1965

final expected output:

  First_Name Last_Name  Age title   birth_year
0        Jon     Smith   21 Mr      1999
1       Mark     Brown   38 Mr      1982
2      Maria       Lee   42 Ms      1978
3       Jill     Jones   28 Mr      1992
4       Jack      Ford   55 Mr      1965

Please suggest. Thanks!

do you need a function for that logic? You could do it faster in pandas — Kenan
– Kenan, Commented Dec 2, 2020 at 15:21

Henry Yik · Accepted Answer · 2020-12-02 15:42:52Z

2

Although you can apply, best is to use vectorized functions (see When should I (not) want to use pandas apply() in my code?). Your logic can be simplified as below:

print (df.assign(title=np.where(df["First_Name"].eq("Maria"), "Ms", "Mr"),
                 birth_year=pd.Timestamp.now().year-df["Age"])) # or 2020-df["Age"]

  First_Name Last_Name  Age title  birth_year
0        Jon     Smith   21    Mr        1999
1       Mark     Brown   38    Mr        1982
2      Maria       Lee   42    Ms        1978
3       Jill     Jones   28    Mr        1992
4       Jack      Ford   55    Mr        1965

answered Dec 2, 2020 at 15:42

Henry Yik

22.6k5 gold badges21 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kenan · Accepted Answer · 2020-12-03 16:26:09Z

1

Here are two ways, apply and create the new columns

df[['title', 'birth_year']] = pd.DataFrame(df.apply(getTitleBirthYear, axis=1).tolist())

df[['title', 'birth_year']] = df.apply(getTitleBirthYear, axis=1, result_type='expand')

  First_Name Last_Name  Age title  birth_year
0        Jon     Smith   21    Mr        1999
1       Mark     Brown   38    Mr        1982
2      Maria       Lee   42    Ms        1978
3       Jill     Jones   28    Mr        1992
4       Jack      Ford   55    Mr        1965

edited Dec 3, 2020 at 16:26

answered Dec 2, 2020 at 15:22

Kenan

14.2k9 gold badges47 silver badges56 bronze badges

4 Comments

Kenan Over a year ago

your welcome! does that completely answer your question?

ManiK Over a year ago

To take this question further:- what if my sample function takes two diff data-frames as arguments like for example - getTitleBirthYear(df1, df2) ? Can you please help how to use apply() in such case ? Same statement above with 2 args gives error as:- getTitleBirthYear() missing 1 required positional argument: 'df2'", 'occurred at index 0'

Kenan Over a year ago

this sounds like a more involved question. Can you create a new post with that question. Also will you need the output appended to df1/2. Which df is the apply placed on?

ManiK Over a year ago

posted a follow up question here:- link

Collectives™ on Stack Overflow

How to add/insert output of a function call that returns multiple fields, as new columns into Pandas dataframe?

2 Answers 2

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related