13

I have trained a RandomForestClassifier from Python Sckit Learn Module with very big dataset, but question is how can I possibly save this model and let other people apply it on their end. Thank you!

1

2 Answers 2

29

The recommended method is to use joblib, this will result in a much smaller file than a pickle:

from sklearn.externals import joblib
joblib.dump(clf, 'filename.pkl') 

#then your colleagues can load it

clf = joblib.load('filename.pkl')

See the online docs

Sign up to request clarification or add additional context in comments.

Comments

5

Have you tried pickling the RandomForestClassifier using the Pickle module and then saving it to the disk?

Here’s an example based on the pickle docs:

import pickle

classifier = RandomForestClassifier(etc)
output = open('classifier.pkl', 'wb')
pickle.dump(classifier, output)
output.close()

The “other people” could then reload the pickled object as follows:

import pickle

f = open('classifier.pkl', 'rb')
classifier = pickle.load(f)
f.close()

1 Comment

joblib is preferred and less verbose (i.e. smaller file): scikit-learn.org/stable/tutorial/basic/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.