Save python random forest model to file

Question

In R, after running "random forest" model, I can use save.image("***.RData") to store the model. Afterwards, I can just load the model to do predictions directly.

Can you do a similar thing in python? I separate the Model and Prediction into two files. And in Model file:

rf= RandomForestRegressor(n_estimators=250, max_features=9,compute_importances=True)
fit= rf.fit(Predx, Predy)

I tried to return rf or fit, but still can't load the model in the prediction file.

Can you separate the model and prediction using the sklearn random forest package?

Note that R's save.image saves everything in your workspace, including datasets, working variables, etc. If you only want the fitted model, use save. — Hong Ooi
– Hong Ooi, Commented Dec 18, 2013 at 16:15
Wow! Thanks for this useful answer! Bc everytime I save.image, the file should be veryyyy large. Thanks! — user3013706
– user3013706, Commented Dec 20, 2013 at 15:26

Jake Burkhead · Accepted Answer · 2013-12-18 16:09:18Z

45

...
import cPickle

rf = RandomForestRegresor()
rf.fit(X, y)

with open('path/to/file', 'wb') as f:
    cPickle.dump(rf, f)


# in your prediction file                                                                                                                                                                                                           

with open('path/to/file', 'rb') as f:
    rf = cPickle.load(f)


preds = rf.predict(new_X)

answered Dec 18, 2013 at 16:09

Jake Burkhead

6,5452 gold badges24 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user3013706 Over a year ago

Further question: 'path/to/file', what format should i use to save the file? thanks

Jake Burkhead Over a year ago

@user3013706 do you mean what file extension? It shouldnt matter. I think the convention is to use .cpickle

MaxNoe Over a year ago

The scikit learn docs recommand joblib.dump, which also comes in sklearn.externals.joblib

lamecicle Over a year ago

Is this answer still relevant for python3? I see cPickle is now _pickle.

Dr_Zaszuś Over a year ago

@lamecicle from what I know, it's just pickle, the default implementation should be in C.

|

pplonski · Accepted Answer · 2020-10-13 06:57:09Z

22

You can use joblib to save and load the Random Forest from scikit-learn (in fact, any model from scikit-learn)

The example:

import joblib
from sklearn.ensemble import RandomForestClassifier
# create RF
rf = RandomForestClassifier()
# fit on some data
rf.fit(X, y)

# save
joblib.dump(rf, "my_random_forest.joblib")

# load
loaded_rf = joblib.load("my_random_forest.joblib")

What is more, the joblib.dump has compress argument, so the model can be compressed. I made very simple test on iris dataset and compress=3 reduces the size of the file about 5.6 times.

edited Oct 13, 2020 at 6:57

answered Jun 24, 2020 at 14:42

pplonski

5,9772 gold badges40 silver badges36 bronze badges

1 Comment

st0ne Over a year ago

joblib.save is joblib.dump

O.rka · Accepted Answer · 2016-03-20 16:16:30Z

I use dill, it stores all the data and I think possibly module information? Maybe not. I remember trying to use pickle for storing these really complicated objects and it didn't work for me. cPickle probably does the same job as dill but i've never tried cpickle. it looks like it works in literally the exact same way. I use "obj" extension but that's by no means conventional...It just made sense for me since I was storing an object.

import dill
wd = "/whatever/you/want/your/working/directory/to/be/"
rf= RandomForestRegressor(n_estimators=250, max_features=9,compute_importances=True)
rf.fit(Predx, Predy)
dill.dump(rf, open(wd + "filename.obj","wb"))

btw, not sure if you use iPython, but sometimes writing a file that way doesn't so you have to do the:

with open(wd + "filename.obj","wb") as f:
    dill.dump(rf,f)

call the objects again:

model = dill.load(open(wd + "filename.obj","rb"))

Ch HaXam · Accepted Answer · 2017-02-07 15:26:34Z

0

for the model storing you can also use .sav formate. it stores complete model and information.

answered Feb 7, 2017 at 15:26

Ch HaXam

4993 silver badges16 bronze badges

Comments

Leszek · Accepted Answer · 2020-11-19 08:56:43Z

0

I'd reiterate that joblib does the job well and it provides really good compression options (ie lzma).

with open("clf.pkl", "wb") as out: pickle.dump(clf, out)
with open("clf.dill", "wb") as out: dill.dump(clf, out)
joblib.dump(clf, "clf.jbl")
joblib.dump(clf, "clf.jbl.lzma")
joblib.dump(clf, "clf.jbl.gz")

!du clf.*
24576   clf.dill
24576   clf.jbl
5120    clf.jbl.gz
3072    clf.jbl.lzma
24576   clf.pkl

answered Nov 19, 2020 at 8:56

Leszek

1,3602 gold badges12 silver badges23 bronze badges

Collectives™ on Stack Overflow

Save python random forest model to file

5 Answers 5

6 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

6 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related