Just getting started in ML and was needing some help with getting sklearn to work with pandas.
I was reading this and decided to try it out with a DataFrame I had. Below is what I did, and the error that came from it. I'm pretty new to all of this so excuse me if I'm overlooking something dumb, but I thought it would be better to ask here versus try to hack away to find an answer without really understanding it.
Thanks guys!
In [518]: cols = ['A','B','C','D','E','F','G','H','I','J','K']
In [519]: x = df['Miss'].values
In [520]: y = df[list(cols)].values
In [532]: y.shape
Out[532]: (11345, 11)
In [533]: x.shape
Out[533]: (11345,)
clf = Pipeline([
('feature_selection', LinearSVC(penalty="l1", dual=False)),
('classification', RandomForestClassifier())])
In [536]: clf.fit(x,y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/home/cschwalbach/as_research_repo/logs/<ipython-input-536-5c1831092d7a> in <module>()
----> 1 clf.fit(x,y)
/usr/lib64/python2.7/site-packages/sklearn/pipeline.pyc in fit(self, X, y, **fit_params)
124 data, then fit the transformed data using the final estimator.
125 """
--> 126 Xt, fit_params = self._pre_transform(X, y, **fit_params)
127 self.steps[-1][-1].fit(Xt, y, **fit_params)
128 return self
/usr/lib64/python2.7/site-packages/sklearn/pipeline.pyc in _pre_transform(self, X, y, **fit_params)
114 for name, transform in self.steps[:-1]:
115 if hasattr(transform, "fit_transform"):
--> 116 Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
117 else:
118 Xt = transform.fit(Xt, y, **fit_params_steps[name]) \
/usr/lib64/python2.7/site-packages/sklearn/base.pyc in fit_transform(self, X, y, **fit_params)
362 else:
363 # fit method of arity 2 (supervised transformation)
--> 364 return self.fit(X, y, **fit_params).transform(X)
365
366
/usr/lib64/python2.7/site-packages/sklearn/svm/base.pyc in fit(self, X, y)
684 raise ValueError("X and y have incompatible shapes.\n"
685 "X has %s samples, but y has %s." %
--> 686 (X.shape[0], y.shape[0]))
687
688 liblinear.set_verbosity_wrap(self.verbose)
ValueError: X and y have incompatible shapes.
X has 1 samples, but y has 124795.
*.fit(X, y)whereXis theN x darray withNobservations` anddfeatures. So you want to swap yourxandy. You should redefine them to be consistent with everyone else instead ofclf.fit(y, x).