1

I had a pandas dataframe that had columns with strings from 0-9 as column names:

working_df = pd.DataFrame(np.random.rand(5,10),index=range(0,5), columns=[str(x) for x in range(10)])
working_df.loc[:,'outcome'] = [0,1,1,0,1]

I then wanted to get an array of all of these numbers into one column so I did:

array_list = [Y for Y in x[[str(num) for num in range(10)]].values]

which gave me:

[array([ 0.0793451 ,  0.3288617 ,  0.75887129,  0.01128641,  0.64105905,
         0.78789297,  0.69673768,  0.20354558,  0.48976411,  0.72848541]),
 array([ 0.53511388,  0.08896322,  0.10302786,  0.08008444,  0.18218731,
         0.2342337 ,  0.52622153,  0.65607384,  0.86069294,  0.8864577 ]),
 array([ 0.82878026,  0.33986175,  0.25707122,  0.96525733,  0.5897311 ,
         0.3884232 ,  0.10943644,  0.26944414,  0.85491211,  0.15801284]),
 array([ 0.31818888,  0.0525836 ,  0.49150727,  0.53682492,  0.78692193,
         0.97945708,  0.53181293,  0.74330327,  0.91364064,  0.49085287]),
 array([ 0.14909577,  0.33959452,  0.20607263,  0.78789116,  0.41780657,
         0.0437907 ,  0.67697385,  0.98579928,  0.1487507 ,  0.41682309])]

I then attached it to my dataframe using:

working_df.loc[:,'array_list'] = pd.Series(array_list)

I then setup my rf_clf = RandomForestClassifier() and I try to rf_clf.fit(working_df['array_list'][1:].values, working_df['outcome'][1:].values) which results in the ValueError: setting an array element with sequence

Is it a problem with the array of arrays in the fitting? Thanks for any insight.

1
  • Please could you show the full error traceback in your question so that we can see where exactly the exception is being raised Commented Oct 21, 2015 at 20:50

1 Answer 1

2

The problem is that scikit-learn expects a two-dimensional array of values as input. You're passing a one dimensional array of objects (with each object itself being a one-dimensional array).

A quick fix would be to do this:

X = np.array(list(working_df['array_list'][1:]))
y = working_df['outcome'][1:].values
rf_clf.fit(X, y)

A better fix would be to not store your two-dimensional feature array within a one-dimensional pandas column.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks! your videos are what got me started on scikit-learn.. thanks for the tip

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.