Python scikit-learn: exporting trained classifier

Question

I am using a DBN (deep belief network) from nolearn based on scikit-learn.

I have already built a Network which can classify my data very well, now I am interested in exporting the model for deployment, but I don't know how (I am training the DBN every time I want to predict something). In matlab I would just export the weight matrix and import it in another machine.

Does someone know how to export the model/the weight matrix to be imported without needing to train the whole model again?

Have you tried to simply serialize model with pickle module? — ffriend
– ffriend, Commented Jul 7, 2013 at 12:06

ogrisel · Accepted Answer · 2019-09-25 13:12:44Z

71

First, install joblib.

You can use:

>>> import joblib
>>> joblib.dump(clf, 'my_model.pkl', compress=9)

And then later, on the prediction server:

>>> import joblib
>>> model_clone = joblib.load('my_model.pkl')

This is basically a Python pickle with an optimized handling for large numpy arrays. It has the same limitations as the regular pickle w.r.t. code change: if the class structure of the pickle object changes you might no longer be able to unpickle the object with new versions of nolearn or scikit-learn.

If you want long-term robust way of storing your model parameters you might need to write your own IO layer (e.g. using binary format serialization tools such as protocol buffers or avro or an inefficient yet portable text / json / xml representation such as PMML).

edited Sep 25, 2019 at 13:12

answered Jul 7, 2013 at 12:19

ogrisel

40.3k14 gold badges120 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Martin Thoma Over a year ago

I get RuntimeError: maximum recursion depth exceeded with joblib.dump(clf, 'my_model.pkl', compress=9).

Yevhen Kuzmovych Over a year ago

Note: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Now you can install it with pip install joblib and import directly.

ben26941 · Accepted Answer · 2016-09-02 11:24:34Z

11

Pickling/unpickling has the disadvantage that it only works with matching python versions (major and possibly also minor versions) and sklearn, joblib library versions.

There are alternative descriptive output formats for machine learning models, such as developed by the Data Mining Group, such as the predictive models markup language (PMML) and the portable format for analytics (PFA). Of the two, PMML is much better supported.

So you have the option of saving a model from scikit-learn into PMML (for example using sklearn2pmml), and then deploy and run it in java, spark, or hive using jpmml (of course you have more choices).

answered Sep 2, 2016 at 11:24

ben26941

1,76016 silver badges20 bronze badges

2 Comments

villasv Over a year ago

That looks good, but what if the deployment is also Python based? Is there a pmml2sklearn?

Radio Controlled Over a year ago

Is it just me or does anyone else think it is absurd that nowhere it is described/recommended to store the trained parameters and hyperparameters in whichever way one likes and initialize the classifier instance with them wherever it is used? Of course these parameters may depend on the type of classifier, but for a minimal understanding of what one has actually learned during training, would that not be recommended anyway?

Franck Dernoncourt · Accepted Answer · 2015-08-12 03:09:41Z

The section 3.4. Model persistence in scikit-learn documentation covers pretty much everything.

In addition to sklearn.externals.joblib ogrisel pointed to, it shows how to use the regular pickle package:

>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> iris = datasets.load_iris()
>>> X, y = iris.data, iris.target
>>> clf.fit(X, y)  
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

>>> import pickle
>>> s = pickle.dumps(clf)
>>> clf2 = pickle.loads(s)
>>> clf2.predict(X[0])
array([0])
>>> y[0]
0

and gives a few warnings such as models saved in one version of scikit-learn might not load in another version.

Collectives™ on Stack Overflow

Python scikit-learn: exporting trained classifier

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related