3

I am doing a transformation on a variable from a pandas dataframe and then I would like to replace the column with my new values. The problem seems to be that after the transformation, the length of the array is not the same as the length of my dataframe's index. I don't think that is true though.

>>> df['variable'] = stats.boxcox(df.variable)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\eMachine\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.py", line 2119, in __setitem__
    self._set_item(key, value)
  File "C:\Users\eMachine\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.py", line 2165, in _set_item
    value = self._sanitize_column(key, value)
  File "C:\Users\eMachine\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\pandas\core\frame.py", line 2205, in _sanitize_column
    raise AssertionError('Length of values does not match '
AssertionError: Length of values does not match length of index

When I check the length, these lengths seem to disagree. The len(array) says it is 2 but when I call the stats.boxcox it says it is 50000. What is going on here?

>>> len(df)
50000
>>> len(stats.boxcox(df.variable))
2
>>> stats.boxcox(df.variable)
(0    -0.079496
1    -0.117982
2    -0.104637

...
49985    -0.041300
49986     0.651771
49987    -0.115660
49988    -0.118034
49998    -0.118014
49999    -0.034076
Name: feat9, Length: 50000, dtype: float64, 8.4721358117221772)
>>> 
2
  • 1
    Did you check? Print out len(df) and len(stats.boxcox(df.variable)). Commented Apr 6, 2014 at 2:50
  • Just updated the question. Thanks BrenBarn Commented Apr 6, 2014 at 2:58

1 Answer 1

10

You can see in your example that the result of boxcox is a tuple. This is consistent with the documentation, which indicates that boxcox returns a tuple of the transformed data and a lambda value. Notice in the example on that page that it does:

xt, _ = stats.boxcox(x)

. . . showing again that boxcox returns a 2-tuple.

You should be doing df['variable'] = stats.boxcox(df.variable)[0].

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.