1

I was wondering how I would be able to convert my binned dataframe to a binned numpy array that I can use in sklearn's PCA.

Here's my code so far (x is my original unbinned dataframe):

bins=(2,6,10,14,20,26,32,38,44,50,56,62,68,74,80,86,92,98)
binned_data = x.groupby(pd.cut(x.Weight, bins))

I want to convert binned_data to a numpy array. Thanks in advance.

EDIT:

When I try binned_data.values, I receive this error:

AttributeError: Cannot access attribute 'values' of 'DataFrameGroupBy' objects, try using the 'apply' method
3
  • would it be binned_data.values? Commented Jun 12, 2014 at 18:57
  • No, I have tried that and received this: AttributeError: Cannot access attribute 'values' of 'DataFrameGroupBy' objects, try using the 'apply' method Commented Jun 12, 2014 at 19:06
  • Please add a short description of x or some code to generate it. Commented Jun 13, 2014 at 6:58

1 Answer 1

1

You need to apply some kind of aggregation to the GroupBy object to return a DataFrame. Once you have that, you can use .values to extract the numpy arrary.

For example, if you wanted the sum or count of the data in each bin you could do:

binned_data.sum().values
binned_data.size().values

Edit: My code wasn't exactly right, because the column (Weight) and the index will have the same name. It can be fixed by renaming the index, as below:

binned_data = x.groupby(pd.cut(x.Weight, bins)).sum()
binned_data.index.name = 'Weight_Bin'
binned_data.reset_index().values
Sign up to request clarification or add additional context in comments.

1 Comment

Input: data = binned_data.sum().reset_index().values and then I got the Output: ValueError: cannot insert Weight, already exists. What do you think is causing this?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.