Adding multiple columns to pandas df based on row values

Question

I would like to use a function that produces multiple outputs to create multiple new columns in an existing pandas dataframe.

For example, say I have this test function which outputs 2 things:

def testfunc (TranspoId, LogId):
    thing1 = TranspoId + LogId
    thing2 = LogId - TranspoId
    return thing1, thing2

I can give those returned outputs to 2 different variables like so:

Thing1,Thing2 = testfunc(4,28)
print(Thing1)
print(Thing2)

I tried to do this with a dataframe in the following way:

data = {'Name':['Picard','Data','Guinan'],'TranspoId':[1,2,3],'LogId':[12,14,23]}

df = pd.DataFrame(data, columns = ['Name','TranspoId','LogId'])
print(df)

df['thing1','thing2'] = df.apply(lambda row: testfunc(row.TranspoId, row.LogId), axis=1)
print(df)

What I want is something that looks like this:

data = {'Name':['Picard','Data','Guinan'],'TranspoId':[1,2,3],'LogId':[12,14,23], 'Thing1':[13,16,26], 'Thing2':[11,12,20]}
df = pd.DataFrame(data, columns=['Name','TranspoId','LogId','Thing1','Thing2'])
print(df)

In the real world that function is doing a lot of heavy lifting, and I can't afford to run it twice, once for each new variable being added to the df.

I've been hitting myself in the head with this for a few hours. Any insights would be greatly appreciated.

Why can't you simply define the columns without the need of apply, lambda and a custom function? — Celius Stingher
– Celius Stingher, Commented Jun 17, 2020 at 20:37

Murilo Malek · Accepted Answer · 2020-06-17 20:53:26Z

1

I believe the best way is to change the order and make a function that works with Series.

import pandas as pd

# Create function that deals with series
def testfunc (Series1, Series2):
    Thing1 = Series1 + Series2
    Thing2 = Series1 - Series2
    return Thing1, Thing2

# Create df
data = {'Name':['Picard','Data','Guinan'],'TranspoId':[1,2,3],'LogId':[12,14,23]}    
df = pd.DataFrame(data, columns = ['Name','TranspoId','LogId'])

# Apply function
Thing1,Thing2 = testfunc(df['TranspoId'],df['LogId'])
print(Thing1)
print(Thing2)

# Assign new columns
df = df.assign(Thing1 = Thing1)
df = df.assign(Thing2 = Thing2)

# print df
print(df)

answered Jun 17, 2020 at 20:53

Murilo Malek

859 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

LevB · Accepted Answer · 2020-06-17 21:11:34Z

1

Your function should return a series that calculates the new columns in one pass. Then you can use pandas.apply() to add the new fields.

import pandas as pd
df = pd.DataFrame( {'TranspoId':[1,2,3], 'LogId':[4,5,6]})

def testfunc(row):
    new_cols = pd.Series([
       row['TranspoId'] + row['LogId'],
       row['LogId'] - row['TranspoId']]) 
    return new_cols

df[['thing1','thing2']] = df.apply(testfunc, axis = 1)

print(df)

Output:

   TranspoId  LogId  thing1  thing2
0          1      4       5       3
1          2      5       7       3
2          3      6       9       3

answered Jun 17, 2020 at 21:11

LevB

9537 silver badges10 bronze badges

Collectives™ on Stack Overflow

Adding multiple columns to pandas df based on row values

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related