how to create multiple columns at once with apply?

Question

Consider this example

import pandas as pd
import numpy as np

df = pd.DataFrame({'var1' : [1,2,3,4],
                   'var2' : ['a','b','c','d']})
df
Out[100]: 
   var1 var2
0     1    a
1     2    b
2     3    c
3     4    d

I have a function that takes var1 as input and returns three values that I want to store into three different variables. The following seems to work correctly

    def myfunc(var):
        return [['small list'], var + 2, ['another list']]
    
    df.var1.apply(lambda x: myfunc(x))
    Out[101]: 
    0    [[small list], 3, [another list]]
    1    [[small list], 4, [another list]]
    2    [[small list], 5, [another list]]
    3    [[small list], 6, [another list]]
    Name: var1, dtype: object

However, when I try to create the corresponding variables I get an error

df[['my small list', 'my numeric', 'other list']]  = df.var1.apply(lambda x: myfunc(x))
ValueError: Must have equal len keys and value when setting with an iterable

What do you think?

I used to use the great zip solution in Return multiple columns from pandas apply() but with the current Pandas 1.2 this solution does not work anymore

Thanks!

You could try and apply the answer from this: stackoverflow.com/questions/35491274/… — dm2
– dm2, Commented Feb 18, 2021 at 20:37
By the way, you don't need to create a lambda function if you already have that function, you could do df.var1.apply(myfunc) — dm2
– dm2, Commented Feb 18, 2021 at 20:38
the issue is that the dataframe contains other columns as well that I need to keep. this solution does not seem optimal here — ℕʘʘḆḽḘ
– ℕʘʘḆḽḘ, Commented Feb 18, 2021 at 20:39
I'll write out the answer (you can keep the other columns as well) — dm2
– dm2, Commented Feb 18, 2021 at 20:41

zwithouta · Accepted Answer · 2021-02-19 18:40:13Z

5

Returning a series is possible the most readable solution.

def myfunc(var):
    return pd.Series([['small list'], var + 2, ['another list']])

df[['my small list', 'my numeric', 'other list']]  = df.var1.apply(lambda x: myfunc(x))

However, for larger dataframes you should prefer either the zip or the dataframe approach.

import pandas as pd # 1.2.2
import perfplot

def setup(n):
    return pd.DataFrame(dict(
        var1=list(range(n))
    ))
 
def with_series(df):
    def myfunc(var):
        return pd.Series([['small list'], var + 2, ['other list']])
    out = pd.DataFrame()
    out[['small list', 'numeric', 'other list']] = df.var1.apply(lambda x: myfunc(x))

def with_zip(df):
    def myfunc(var):
        return [['small list'], var + 2, ['other list']]
    out = pd.DataFrame()
    out['small list'], out['numeric'], out['other list'] = list(zip(*df.var1.apply(lambda x: myfunc(x))))

def with_dataframe(df):
    def myfunc(var):
        return [['small list'], var + 2, ['other list']]
    out = pd.DataFrame()
    out[['small list', 'numeric', 'other list']] = pd.DataFrame(df.var1.apply(myfunc).to_list())


perfplot.show(
    setup=setup,
    kernels=[
        with_series,
        with_zip,
        with_dataframe,
    ],
    labels=["series", "zip", "df"],
    n_range=[2 ** k for k in range(20)],
    xlabel="len(df)",
    equality_check=None,
)

edited Feb 19, 2021 at 18:40

answered Feb 18, 2021 at 20:46

zwithouta

1,6112 gold badges16 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ℕʘʘḆḽḘ Over a year ago

this is a good one I think. I wonder about the performance though

zwithouta Over a year ago

@ℕʘʘḆḽḘ added a performance comparison of the different approaches

dm2 · Accepted Answer · 2021-02-18 20:47:09Z

3

Using method from this stackoverflow question, you just need to split the pandas Series object coming from df.var1.apply(myfunc) into columns.

What I did was:

df[['out1','out2','out3']] = pd.DataFrame(df['var1'].apply(myfunc).to_list())

As you can see, this doesn't overwrite your DataFrame, just assigns the resulting columns to new columns in your DataFrame.

DataFrame after the apply method:

   var1 var2          out1  out2            out3
0     1    a  [small_list]     3  [another_list]
1     2    b  [small_list]     4  [another_list]
2     3    c  [small_list]     5  [another_list]
3     4    d  [small_list]     6  [another_list]

answered Feb 18, 2021 at 20:47

dm2

4,2953 gold badges21 silver badges31 bronze badges

Comments

André C. Andersen · Accepted Answer · 2021-02-19 06:21:20Z

3

The zip method seems to work fine still:

import pandas as pd
import numpy as np

df = pd.DataFrame({'var1' : [1,2,3,4],
                   'var2' : ['a','b','c','d']})

def myfunc(var):
    return [['small list'], var + 2, ['another list']]

df['my small list'], df['my numeric'], df['other list'] = zip(*df.var1.apply(lambda x: myfunc(x)))

Return multiple columns from pandas apply()

The really odd thing is how the inner lists are being coerced into tuples. From experimenting it seems to matter that the outer type is of type list.

To force the inner lists to stay lists I had to do the following:

df['my small list'], df['my numeric'], df['other list'] = (list(row) for row in zip(*df.var1.apply(lambda x: myfunc(x))))

edited Feb 19, 2021 at 6:21

answered Feb 18, 2021 at 20:48

André C. Andersen

9,5053 gold badges59 silver badges84 bronze badges

4 Comments

ℕʘʘḆḽḘ Over a year ago

strange. I have pandas 1.2.0

André C. Andersen Over a year ago

@ℕʘʘḆḽḘ Did you remember the asterisk in zip? What is the error message? I'm using 1.2.1, I'd expect it to fail like yours.

ℕʘʘḆḽḘ Over a year ago

yes, that used to work with the *.... very strange. just a note, it seems the lists are actually tuples in your print?

André C. Andersen Over a year ago

Yeah. That is really odd? I updated the answer with more details. Turns out it depends on the type of the outer column collections.

Ynjxsjmh · Accepted Answer · 2021-04-08 15:13:18Z

0

You can use result_type argument of pandas.DataFrame.apply()

df[['my small list', 'my numeric', 'other list']]  = df.apply(lambda x: myfunc(x.var1), axis=1, result_type='expand')

# print(df)

   var1 var2 my small list  my numeric      other list
0     1    a  [small list]           3  [another list]
1     2    b  [small list]           4  [another list]
2     3    c  [small list]           5  [another list]
3     4    d  [small list]           6  [another list]

answered Apr 8, 2021 at 15:13

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Collectives™ on Stack Overflow

how to create multiple columns at once with apply?

4 Answers 4

2 Comments

Comments

4 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related