0
 spark-submit --master yarn-cluster --deploy-mode cluster test.py

end up with error

import pandas as pd ImportError: No module named pandas

this is the only error I see.

using anaconda python distribution 2.7 PYSPARK_VENV]/lib/python2.7/site-packages/ location has pandas.

1
  • Can you elaborate what is your error in details, please ? Maybe an error warning snippet and the version of python and spark you are using would be helpful. Commented Jan 25, 2019 at 18:19

2 Answers 2

1

Setting PYSPARK_PYTHON path should solve this:

check the pyspark path using: which pyspark

export PYSPARK_PYTHON=/pyspark/path/from/above

Sign up to request clarification or add additional context in comments.

Comments

0

You can check whether pandas installed in [PYSPARK_VENV]/lib/python2.7/site-packages/ folder. Looks like you are executing your pyspark application on another python interpreter. Please ensure that you have installed pandas package for that interpreter.

You can use Anaconda for managing python packages in these knida situations.

1 Comment

pandas are installed, I have no problem using pysaprk and I have issues with python interpreter.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.