1

Consider this example

import pandas as pd
import numpy as np

df = pd.DataFrame({'var1' : [1,2,3,4],
                   'var2' : ['a','b','c','d']})
df
Out[100]: 
   var1 var2
0     1    a
1     2    b
2     3    c
3     4    d

I have a function that takes var1 as input and returns three values that I want to store into three different variables. The following seems to work correctly

    def myfunc(var):
        return [['small list'], var + 2, ['another list']]
    
    df.var1.apply(lambda x: myfunc(x))
    Out[101]: 
    0    [[small list], 3, [another list]]
    1    [[small list], 4, [another list]]
    2    [[small list], 5, [another list]]
    3    [[small list], 6, [another list]]
    Name: var1, dtype: object

However, when I try to create the corresponding variables I get an error

df[['my small list', 'my numeric', 'other list']]  = df.var1.apply(lambda x: myfunc(x))
ValueError: Must have equal len keys and value when setting with an iterable

What do you think?

I used to use the great zip solution in Return multiple columns from pandas apply() but with the current Pandas 1.2 this solution does not work anymore

Thanks!

4
  • You could try and apply the answer from this: stackoverflow.com/questions/35491274/… Commented Feb 18, 2021 at 20:37
  • By the way, you don't need to create a lambda function if you already have that function, you could do df.var1.apply(myfunc) Commented Feb 18, 2021 at 20:38
  • the issue is that the dataframe contains other columns as well that I need to keep. this solution does not seem optimal here Commented Feb 18, 2021 at 20:39
  • 1
    I'll write out the answer (you can keep the other columns as well) Commented Feb 18, 2021 at 20:41

4 Answers 4

5

Returning a series is possible the most readable solution.

def myfunc(var):
    return pd.Series([['small list'], var + 2, ['another list']])

df[['my small list', 'my numeric', 'other list']]  = df.var1.apply(lambda x: myfunc(x))

However, for larger dataframes you should prefer either the zip or the dataframe approach.

import pandas as pd # 1.2.2
import perfplot

def setup(n):
    return pd.DataFrame(dict(
        var1=list(range(n))
    ))
 
def with_series(df):
    def myfunc(var):
        return pd.Series([['small list'], var + 2, ['other list']])
    out = pd.DataFrame()
    out[['small list', 'numeric', 'other list']] = df.var1.apply(lambda x: myfunc(x))

def with_zip(df):
    def myfunc(var):
        return [['small list'], var + 2, ['other list']]
    out = pd.DataFrame()
    out['small list'], out['numeric'], out['other list'] = list(zip(*df.var1.apply(lambda x: myfunc(x))))

def with_dataframe(df):
    def myfunc(var):
        return [['small list'], var + 2, ['other list']]
    out = pd.DataFrame()
    out[['small list', 'numeric', 'other list']] = pd.DataFrame(df.var1.apply(myfunc).to_list())


perfplot.show(
    setup=setup,
    kernels=[
        with_series,
        with_zip,
        with_dataframe,
    ],
    labels=["series", "zip", "df"],
    n_range=[2 ** k for k in range(20)],
    xlabel="len(df)",
    equality_check=None,
)

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

this is a good one I think. I wonder about the performance though
@ℕʘʘḆḽḘ added a performance comparison of the different approaches
3

Using method from this stackoverflow question, you just need to split the pandas Series object coming from df.var1.apply(myfunc) into columns.

What I did was:

df[['out1','out2','out3']] = pd.DataFrame(df['var1'].apply(myfunc).to_list())

As you can see, this doesn't overwrite your DataFrame, just assigns the resulting columns to new columns in your DataFrame.

DataFrame after the apply method:

   var1 var2          out1  out2            out3
0     1    a  [small_list]     3  [another_list]
1     2    b  [small_list]     4  [another_list]
2     3    c  [small_list]     5  [another_list]
3     4    d  [small_list]     6  [another_list]

Comments

3

The zip method seems to work fine still:

import pandas as pd
import numpy as np

df = pd.DataFrame({'var1' : [1,2,3,4],
                   'var2' : ['a','b','c','d']})

def myfunc(var):
    return [['small list'], var + 2, ['another list']]

df['my small list'], df['my numeric'], df['other list'] = zip(*df.var1.apply(lambda x: myfunc(x)))

notebook

Return multiple columns from pandas apply()

The really odd thing is how the inner lists are being coerced into tuples. From experimenting it seems to matter that the outer type is of type list.

To force the inner lists to stay lists I had to do the following:

df['my small list'], df['my numeric'], df['other list'] = (list(row) for row in zip(*df.var1.apply(lambda x: myfunc(x))))

4 Comments

strange. I have pandas 1.2.0
@ℕʘʘḆḽḘ Did you remember the asterisk in zip? What is the error message? I'm using 1.2.1, I'd expect it to fail like yours.
yes, that used to work with the *.... very strange. just a note, it seems the lists are actually tuples in your print?
Yeah. That is really odd? I updated the answer with more details. Turns out it depends on the type of the outer column collections.
0

You can use result_type argument of pandas.DataFrame.apply()

df[['my small list', 'my numeric', 'other list']]  = df.apply(lambda x: myfunc(x.var1), axis=1, result_type='expand')
# print(df)

   var1 var2 my small list  my numeric      other list
0     1    a  [small list]           3  [another list]
1     2    b  [small list]           4  [another list]
2     3    c  [small list]           5  [another list]
3     4    d  [small list]           6  [another list]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.