0
from sklearn.model_selection import train_test_split  
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import RandomizedSearchCV

train,test,train_label,test_label=train_test_split(feature_data,target_data,test_size=0.20,random_state=3)

sc=StandardScaler()
train_std=sc.fit_transform(train)
test_Std=sc.transform(test)

pipe=SGDRegressor()
parameters = {'sgd__loss':['squared_loss','huber'],
         'sgd__n_iter':np.ceil(106/len(train_label)),
         'sgd__alpha':10.0**np.arange(1,7),             
    }
g_search=RandomizedSearchCV(pipe,param_distributions=parameters,random_state=2)
g_fit=g_search.fit(train_std,train_label)

Training Data:

train_std 
Out[46]:

array([[ 1.99470848,  2.39114909,  0.96705   , ...,  0.23698853,
     0.89215521, -0.74111955],
   [-0.50742363, -0.54567689, -0.29516734, ...,  0.00491999,
    -0.73959331,  0.42680023],
   [-0.46965669, -0.10483307,  0.90566027, ..., -0.34272278,
     0.69705485,  0.56151837],
   ...,
   [-0.05849323,  0.11803686,  0.45737245, ...,  0.24026818,
     0.75026404, -0.3829142 ],
   [ 0.83045625,  0.66257208, -0.01582026, ...,  0.32870492,
    -0.27844698, -0.83648146],
   [-0.0886727 ,  0.46158079,  1.36521081, ..., -0.10050365,
    -0.68638412, -0.04006983]])

Training Label

 train_label
Out[47]: 
24429     1.863
32179    18.296
42715     1.417
6486      6.562
39407    18.669
         ...
42602     6.002
6557      2.921
30305    11.835
4718      1.212

Error : object of type 'numpy.float64' has no len()

g_fit is causing error while fitting the training data

I am trying to use SGDRegressor by RandomizedSearchCV but this is getting an

error while fiiting the training data

2
  • 3
    Thanks for posting a block of code with no explanation of: 1) What you're trying to do, 2) What you've tried in order to solve the problem. Please read How to ask a question Commented Jun 24, 2018 at 9:26
  • I am trying to implement Scikit SGDRegressor but i am getting an error while training the model with RandomizdSeachCV Commented Jun 24, 2018 at 9:41

2 Answers 2

2

So the error is caused by the value corresponding to key 'sgd__n_iter' : np.ceil(10**6/len(train_label)).

So you have two options to fix this:

  1. Turn it into a list: [np.ceil(10**6/len(train_label))]
  2. Add it right away to the constructor of SGDRegressor and do not put it into the param_distributions dictionary.

I also noticed some inconsistencies in your code so see below a minimal and slightly cleaner version of your code

import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_regression

n_samples = 1000
n_features = 50
X, y = make_regression(n_samples=n_samples, n_features=n_features)
X_train, X_test, y_train, y_test = train_test_split(X, y)

pipe = Pipeline([('scaler', StandardScaler()),
                 ('sgd', SGDRegressor())])

parameters = {'sgd__loss': ['squared_loss','huber'],
              'sgd__n_iter': [np.ceil(10**6 / n_samples)],
              'sgd__alpha': 10.0**np.arange(1,7)} 

g_search = RandomizedSearchCV(pipe, param_distributions=parameters, random_state=2)

g_search.fit(X_train, y_train)   
Sign up to request clarification or add additional context in comments.

2 Comments

i implemented your suggestion and it is working but it is also making a warning : DeprecationWarning: n_iter parameter is deprecated in 0.19 and will be removed in 0.21. Use max_iter and tol instead. ( DeprecationWarning) can you help me how to avoid this warning
Well, its just a warning and you can ignore it. In the future, the parameter n_iter will be deleted but as of right now it is still a part of your scikit-learn version
2

I guess the following line causes the mentioned error:

parameters = {...,
         'sgd__n_iter':np.ceil(10**6/len(train_label)), # <--- should be a list-like object, not a scalar!
         ...,             
    }

try the following:

parameters = {'sgd__loss':['squared_loss','huber'],
         'sgd__n_iter': [np.ceil(10**6/len(train_label))],
         # NOTE:        ^                               ^
         'sgd_alpha':10.0**np.arange(1,7),             
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.