1

I would like to use a function that produces multiple outputs to create multiple new columns in an existing pandas dataframe.

For example, say I have this test function which outputs 2 things:

def testfunc (TranspoId, LogId):
    thing1 = TranspoId + LogId
    thing2 = LogId - TranspoId
    return thing1, thing2

I can give those returned outputs to 2 different variables like so:

Thing1,Thing2 = testfunc(4,28)
print(Thing1)
print(Thing2)

I tried to do this with a dataframe in the following way:

data = {'Name':['Picard','Data','Guinan'],'TranspoId':[1,2,3],'LogId':[12,14,23]}

df = pd.DataFrame(data, columns = ['Name','TranspoId','LogId'])
print(df)

df['thing1','thing2'] = df.apply(lambda row: testfunc(row.TranspoId, row.LogId), axis=1)
print(df)

What I want is something that looks like this:

data = {'Name':['Picard','Data','Guinan'],'TranspoId':[1,2,3],'LogId':[12,14,23], 'Thing1':[13,16,26], 'Thing2':[11,12,20]}
df = pd.DataFrame(data, columns=['Name','TranspoId','LogId','Thing1','Thing2'])
print(df)

In the real world that function is doing a lot of heavy lifting, and I can't afford to run it twice, once for each new variable being added to the df.

I've been hitting myself in the head with this for a few hours. Any insights would be greatly appreciated.

1
  • Why can't you simply define the columns without the need of apply, lambda and a custom function? Commented Jun 17, 2020 at 20:37

2 Answers 2

1

I believe the best way is to change the order and make a function that works with Series.

import pandas as pd

# Create function that deals with series
def testfunc (Series1, Series2):
    Thing1 = Series1 + Series2
    Thing2 = Series1 - Series2
    return Thing1, Thing2

# Create df
data = {'Name':['Picard','Data','Guinan'],'TranspoId':[1,2,3],'LogId':[12,14,23]}    
df = pd.DataFrame(data, columns = ['Name','TranspoId','LogId'])

# Apply function
Thing1,Thing2 = testfunc(df['TranspoId'],df['LogId'])
print(Thing1)
print(Thing2)

# Assign new columns
df = df.assign(Thing1 = Thing1)
df = df.assign(Thing2 = Thing2)

# print df
print(df)
Sign up to request clarification or add additional context in comments.

Comments

1

Your function should return a series that calculates the new columns in one pass. Then you can use pandas.apply() to add the new fields.

import pandas as pd
df = pd.DataFrame( {'TranspoId':[1,2,3], 'LogId':[4,5,6]})

def testfunc(row):
    new_cols = pd.Series([
       row['TranspoId'] + row['LogId'],
       row['LogId'] - row['TranspoId']]) 
    return new_cols

df[['thing1','thing2']] = df.apply(testfunc, axis = 1)

print(df)

Output:

   TranspoId  LogId  thing1  thing2
0          1      4       5       3
1          2      5       7       3
2          3      6       9       3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.