0

I am trying to follow the examples on the Apache Spark documentation site: https://spark.apache.org/docs/2.0.0-preview/submitting-applications.html

I started a Spark standalone cluster and want to run the example Python application. I am in my spark-2.0.0-bin-hadoop2.7 directory and ran the following command

./bin/spark-submit \
--master spark://207.184.161.138:7077 \
examples/src/main/python/pi.py \
1000

However, I get the error

jupyter: '/Users/MyName/spark-2.0.0-bin- \
hadoop2.7/examples/src/main/python/pi.py' is not a Jupyter command

This is what my bash_profile looks like

#setting path for Spark
export SPARK_PATH=~/spark-2.0.0-bin-hadoop2.7
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
alias snotebook='$SPARK_PATH/bin/pyspark --master local[2]'

What am I doing wrong?

1
  • Unset PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS before submitting. Commented Sep 3, 2016 at 11:37

2 Answers 2

1

The PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS are meant for running the ipython/jupyter shell when opening the pyspark shell ( More info at How to load IPython shell with PySpark ).

You can set this up like:

alias snotebook='PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook $SPARK_PATH/bin/pyspark --master local[2]'

So that it doesn't interfere with pyspark when submitting

Sign up to request clarification or add additional context in comments.

Comments

1

Add PYSPARK_DRIVER_PYTHON=ipython before the spark-submit command.

Example:

PYSPARK_DRIVER_PYTHON=ipython ./bin/spark-submit \ 
/home/SimpleApp.py

1 Comment

Nice. Only problem is if I want to pass arguments to the python script. For some reason Ipython is interferring, thinking they are for it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.