3

I am trying to store my model to hdfs using python.

This code is by using pydoop library

import pydoop.hdfs as hdfs

    from_path = prediction_model.fit(orginal_telecom_80p_train[features], orginal_telecom_80p_train["Churn"])
    to_path ='hdfs://192.168.1.101:8020/user/volumata/python_models/churn_model.sav'
    hdfs.put(from_path, to_path)

But, while using this, I am getting this error

AttributeError: 'LogisticRegression' object has no attribute 'startswith'

Then I tried using the pickle option

import pickle 
with open('hdfs://192.168.1.101:8020/user/volumata/python_models/') as hdfs_loc:
pickle.dump(prediction_model, hdfs_loc)

Pickle option is working fine in local, when i tried to store the model in hdfs, this option is also not working for me. Can anyone please suggest how to proceed further for storing the models to hdfs by using python script?

10
  • Please edit your question to include the full traceback.... Also, not sure where prediction_model comes from, but 1) i don't think prediction_model.fit is returning a path of a file 2) PySpark is commonly used for machine learning with Hadoop Commented Apr 20, 2018 at 22:55
  • you removed the answer? Commented Apr 23, 2018 at 5:22
  • Because it doesn't work, yes. I don't use Pickle on Hadoop, so I can't give you a proper solution here Commented Apr 23, 2018 at 5:24
  • Then what else you used for hadoop? can give me any other suggestions instead of pickle Commented Apr 23, 2018 at 5:27
  • As mentioned, Spark... Which uses Pickle internally Commented Apr 23, 2018 at 5:28

1 Answer 1

1

You have to use hdfs.open instead of open, and open the file for writing:

import pickle
import pydoop.hdfs as hdfs

with hdfs.open(to_path, 'w') as f:
    pickle.dump(prediction_model, f)
Sign up to request clarification or add additional context in comments.

4 Comments

Traceback (most recent call last): File "<ipython-input-5-09aac99ede3d>", line 1, in <module> with hdfs.open(to_path, 'w') as f: File "/home/volumata/anaconda3/lib/python3.5/site-packages/pydoop/hdfs/__init__.py", line 121, in open fs = hdfs(host, port, user) File "/home/volumata/anaconda3/lib/python3.5/site-packages/pydoop/hdfs/fs.py", line 146, in init host = common.encode_host(host) File "/home/volumata/anaconda3/lib/python3.5/site-packages/pydoop/hdfs/common.py", line 53, in encode_host if isinstance(host, unicode):
@SARANYA make sure you are using the latest pydoop version, currently 2.0a3: pip install --upgrade --pre pydoop. Pydoop 1.x does not support Python 3
same issue after upgrading also
Maybe the upgrade wasn't successful. The if isinstance(host, unicode) check is not on line 53 in the latest version. Moreover, it's only triggered if you are using Python 2. Check the value of pydoop.__version__

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.