Set python path for Spark worker

Question

What's the "correct" way to set the sys path for Python worker node?

Is it a good idea for worker nodes to "inherit" sys path from master?

Is it a good idea to set the path in the worker nodes' through .bashrc? Or is there some standard Spark way of setting it?

zero323 · Accepted Answer · 2015-10-06 01:03:12Z

8

A standard way of setting environmental variables, including PYSPARK_PYTHON, is to use conf/spark-env.sh file. Spark comes with a template file (conf/spark-env.sh.template) which explains the most common options.

It is a normal bash script so you can use it the same way as you would with .bashrc

You'll find more details in a Spark Configuration Guide.

answered Oct 6, 2015 at 1:03

zero323

331k108 gold badges981 silver badges958 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

safetyduck Over a year ago

Most people are looking to do something like this in spark-env.sh: DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" PYTHONPATH=$PYTHONPATH:$DIR And it does not work. Probably works if you push that to all worker nodes in some side-effect step. What is the run-time way of doing this through pyspark or spark-submit?

safetyduck Over a year ago

This conf setting does the trick in spark standalone: spark.executorEnv.[EnvironmentVariableName]

Peter Pan · Accepted Answer · 2017-09-13 00:27:35Z

4

By the following code you can change the python path only for the current job, which also allow different python path for driver and executors:

    PYSPARK_DRIVER_PYTHON=/home/user1/anaconda2/bin/python PYSPARK_PYTHON=/usr/local/anaconda2/bin/python pyspark --master ..

answered Sep 13, 2017 at 0:27

Peter Pan

1291 silver badge3 bronze badges

Comments

Ani Menon · Accepted Answer · 2020-07-26 13:32:06Z

2

You may do either of the below -

In config,

Update SPARK_HOME/conf/spark-env.sh, add below lines:

# for pyspark
export PYSPARK_PYTHON="path/to/python"
# for driver, defaults to PYSPARK_PYTHON
export PYSPARK_DRIVER_PYTHON="path/to/python"

OR

In the code, add:

import os
# Set spark environments
os.environ['PYSPARK_PYTHON'] = 'path/to/python'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'path/to/python'

answered Jul 26, 2020 at 13:32

Ani Menon

28.4k17 gold badges111 silver badges128 bronze badges

Comments

Ralf Stubner · Accepted Answer · 2018-02-08 16:26:41Z

-3

The error of my case was that:

Exception: Python in worker has different version 2.6 than that in driver 2.7, PySpark cannot run with different minor versions

The solution that helped:

export PYSPARK_PYTHON=python2.7
export PYSPARK_DRIVER_PYTHON=python2.7
jupyter notebook

Of course, I installed python2.7 locally on workers.
I suppose it is also important that I also set the PATH.
I did not rely on local workers' settings. The path was inherited from setting the edge node where is jupyter-notebook.

edited Feb 8, 2018 at 16:26

Ralf Stubner

26.9k4 gold badges44 silver badges79 bronze badges

answered Feb 8, 2018 at 15:31

alekland

11 bronze badge

Collectives™ on Stack Overflow

Set python path for Spark worker

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related