10

I am trying to use a LinearRegression from sklearn and I am getting a 'Could not convert a string to float'. All columns of the dataframe are float and the output y is also float. I have looked at other posts and the suggestions are to convert to float which I have done.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 789 entries, 158 to 684
Data columns (total 8 columns):
f1     789 non-null float64
f2     789 non-null float64
f3     789 non-null float64
f4     789 non-null float64
f5     789 non-null float64
f6     789 non-null float64
OFF    789 non-null uint8
ON     789 non-null uint8
dtypes: float64(6), uint8(2)
memory usage: 44.7 KB

type(y_train)
pandas.core.series.Series
type(y_train[0])
float

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,Y,random_state=0)
X_train.head()
from sklearn.linear_model import LinearRegression
linreg = LinearRegression().fit(X_train, y_train)

The error I get is a

ValueError                                Traceback (most recent call last)
<ipython-input-282-c019320f8214> in <module>()
      6 X_train.head()
      7 from sklearn.linear_model import LinearRegression
----> 8 linreg = LinearRegression().fit(X_train, y_train)
510         n_jobs_ = self.n_jobs
    511         X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'],
--> 512                          y_numeric=True, multi_output=True)
    513 
    514         if sample_weight is not None and np.atleast_1d(sample_weight).ndim > 1:

 527         _assert_all_finite(y)
    528     if y_numeric and y.dtype.kind == 'O':
--> 529         y = y.astype(np.float64)
    530 
    531     check_consistent_length(X, y)

ValueError: could not convert string to float: '--'

Please help.

1
  • what are X and Y? Commented Sep 7, 2017 at 9:40

3 Answers 3

12

A quick solution would involve using pd.to_numeric to convert whatever strings your data might contain to numeric values. If they're incompatible with conversion, they'll be reduced to NaNs.

from sklearn.linear_model import LinearRegression

X = X.apply(pd.to_numeric, errors='coerce')
Y = Y.apply(pd.to_numeric, errors='coerce')

Furthermore, you can choose to fill those values with some default:

X.fillna(0, inplace=True)
Y.fillna(0, inplace=True)

Replace the fill value with whatever's relevant to your problem. I don't recommend dropping these rows, because you might end up dropping different rows from X and Y causing a data-label mismatch.

Finally, split and call your classifier:

X_train, X_test, y_train, y_test = train_test_split(X, Y, random_state=0)
clf = LinearRegression().fit(X_train, y_train)
Sign up to request clarification or add additional context in comments.

4 Comments

But if they become Nans LinearRegression.fit() will still throw an error.
@VivekKumar I don't know what OP wants to do with those NaNs... maybe drop them? Fill them? I'll edit on further clarification.
Aah ok. So this will verify that the data OP has is actually good or not. Thanks
@ColdSpeed Thanks! That helped!
3

I think its better to convert all the string columns to binary(0,1) using the label encoding or one hot encoding after than our linear regression will behave much better.!!

Comments

1

It is because one of your columns contains string values. I had the same problem, because I've been ask to drop a column, but I didn't have to, because the columns were already deleted.

However, after doing this code :

model = LogisticRegressionCV(solver='lbfgs', cv=5, max_iter=1000, random_state=42)
model.fit(X_train, y_train)

I have this error :

could not convert string to float: 'product_mng'

The reason is that X_train still had the string column, which I thought was deleted. As a conclusion, check AGAIN that ALL your column are not string. If there is one, delete it with pd.drop, or label encode (or 1-hot encode) this string column.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.