1

Consider the following simple function:

def Powers(x):
    return [x, x**2, x**3, x**4, x**5]

and input dataframe:

df = pd.DataFrame({ 'x':(1, 2, 3, 4, 5) })

I would like to generate new variables: ['Exp_1', 'Exp_2', 'Exp_3', 'Exp_4', 'Exp_5']

When I apply the function to the dataframe as follows:

df[['Exp_1', 'Exp_2', 'Exp_3', 'Exp_4', 'Exp_5']] = df.apply(lambda x: Powers(x.x), axis=1)

I get:

enter image description here

In other words, the values are transposed. That is, the 5th exponent of 1 is 1 not 5 and the 1st exponent of 5 is 5 and not 1.

I have tried axis=0, in the call above and this does not work either. I also know I have a problem because if the input dataframe is of a different length I get errors.

How do I fix this?

2
  • 1
    not sure if Powers is only an example or not but for some speed if you care about it, you can go for np.vander(df.x, len(df) + 1, increasing=True); this generates a Vandermonde matrix numpy.org/doc/stable/reference/generated/numpy.vander.html. Commented Aug 13, 2022 at 15:43
  • 1
    Thanks. Powers is just a trivial example for MWE. Commented Aug 13, 2022 at 15:44

1 Answer 1

3

You can return Series in Powers function

def Powers(x):
    return pd.Series([x, x**2, x**3, x**4, x**5])

df[['Exp_1', 'Exp_2', 'Exp_3', 'Exp_4', 'Exp_5']] = df.apply(lambda x: Powers(x.x), axis=1)
print(df)

   x  Exp_1  Exp_2  Exp_3  Exp_4  Exp_5
0  1      1      1      1      1      1
1  2      2      4      8     16     32
2  3      3      9     27     81    243
3  4      4     16     64    256   1024
4  5      5     25    125    625   3125

Or use result_type in DataFrame.apply

def Powers(x):
    return [x, x**2, x**3, x**4, x**5]

df[['Exp_1', 'Exp_2', 'Exp_3', 'Exp_4', 'Exp_5']] = df.apply(lambda x: Powers(x.x), axis=1, result_type='expand')
# or
df[['Exp_1', 'Exp_2', 'Exp_3', 'Exp_4', 'Exp_5']] = df.apply(lambda x: Powers(x.x), axis=1).tolist()
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks, that's great. I will experiment with both, but let's say the function will sometimes be used with a scalar and sometimes applied to a dataframe, is option 2 preferred?
@brb Don't quite know what you mean applied to a dataframe. If you mean what you do in this example, it's actually same with df['x'].apply(Powers). Both ways are ok, depending on which you like.
Sorry, I mean sometimes I might want to do results = Powers(3) and before results would be a list. I am not sure I want to return a pd.Series in this case. So your second way of doing result_type='Expand' maybe better as it can be used with a scalar input or used by applying to a dataframe?
@brb In your results = Powers(3), I think second way is better.
Thank you. Appreciate your help. Helping me learn.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.