1

I have a pandas dataframe df of the following shape: (763, 65)

I use the following code to create 4 new columns:

df[['col1', 'col2', 'col3','col4']] = df.apply(myFunc, axis=1)

def myFunc(row):
    #code to get some result from another dataframe
    return result1, result2, result3, result4

The shape of the dataframe which is returned in myFunc is (1, 4). The code runs into the following error:

ValueError: Shape of passed values is (763, 4), indices imply (763, 65)

I know that df has 65 columns and that the returned data from myFunc only has 4 columns. However, I only want to create the 4 new columns (that is, col1, col2, etc.), so in my opinion the code is correct when it only returns 4 columns in myFunc. What am I doing wrong?

1
  • Can you provide a minimal reproducible example that fits your situation? Obviously, you do not need to provide 65 columns. 5 columns should be fine. Commented Oct 11, 2017 at 20:21

1 Answer 1

2

Demo:

In [40]: df = pd.DataFrame({'a':[1,2,3]})

In [41]: df
Out[41]:
   a
0  1
1  2
2  3

In [42]: def myFunc(row):
    ...:     #code to get some result from another dataframe
    ...:     # NOTE: trick is to return pd.Series()
    ...:     return pd.Series([1,2,3,4]) * row['a']
    ...:

In [44]: df[['col1', 'col2', 'col3','col4']] = df.apply(myFunc, axis=1)

In [45]: df
Out[45]:
   a  col1  col2  col3  col4
0  1     1     2     3     4
1  2     2     4     6     8
2  3     3     6     9    12

Disclaimer: try to avoid using .apply(..., axis=1) - as it's a for loop under the hood - i.e. it's not vectoried and will work much slower compared to vectorized Pandas/Numpy ufuncs.

PS if you would provide details of what you are trying to calculate in the myFunc functuion, then we could try to find a vectorized solution...

Sign up to request clarification or add additional context in comments.

1 Comment

my problem (of course) was in the hidden code... myFunc returned a dataframe with column headings and this lead to the error somehow. I now return as follows and it works: return result[['col1', 'col2', 'col3', 'col4']].iloc[0]. Of course this means that only the first row of the series is taken, which is what I want in my code. As for the vectorization, i will create a new thread later. thx!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.