9

What's the "correct" way to set the sys path for Python worker node?

Is it a good idea for worker nodes to "inherit" sys path from master?

Is it a good idea to set the path in the worker nodes' through .bashrc? Or is there some standard Spark way of setting it?

4 Answers 4

8

A standard way of setting environmental variables, including PYSPARK_PYTHON, is to use conf/spark-env.sh file. Spark comes with a template file (conf/spark-env.sh.template) which explains the most common options.

It is a normal bash script so you can use it the same way as you would with .bashrc

You'll find more details in a Spark Configuration Guide.

Sign up to request clarification or add additional context in comments.

2 Comments

Most people are looking to do something like this in spark-env.sh: DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" PYTHONPATH=$PYTHONPATH:$DIR And it does not work. Probably works if you push that to all worker nodes in some side-effect step. What is the run-time way of doing this through pyspark or spark-submit?
This conf setting does the trick in spark standalone: spark.executorEnv.[EnvironmentVariableName]
4

By the following code you can change the python path only for the current job, which also allow different python path for driver and executors:

    PYSPARK_DRIVER_PYTHON=/home/user1/anaconda2/bin/python PYSPARK_PYTHON=/usr/local/anaconda2/bin/python pyspark --master ..

Comments

2

You may do either of the below -

In config,

Update SPARK_HOME/conf/spark-env.sh, add below lines:

# for pyspark
export PYSPARK_PYTHON="path/to/python"
# for driver, defaults to PYSPARK_PYTHON
export PYSPARK_DRIVER_PYTHON="path/to/python"

OR

In the code, add:

import os
# Set spark environments
os.environ['PYSPARK_PYTHON'] = 'path/to/python'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'path/to/python'

Comments

-3

The error of my case was that:

Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions

The solution that helped:

export PYSPARK_PYTHON=python2.7
export PYSPARK_DRIVER_PYTHON=python2.7
jupyter notebook

Of course, I installed python2.7 locally on workers.
I suppose it is also important that I also set the PATH.
I did not rely on local workers' settings. The path was inherited from setting the edge node where is jupyter-notebook.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.