0

I am creating a Spark application with AWS EMR but spark-submit runs with Python 3 instead of Python 2. But when I run pyspark instead, it is Python 2.

How can I force spark-submit to use Python 2?

I tried to do

export PYSPARK_PYTHON=/usr/bin/python2 

but it didn't work.

Thanks

2 Answers 2

1

Have you tried to insert the

PYSPARK_PYTHON=/usr/bin/python2 

statement into the spark-env.sh file?

Sign up to request clarification or add additional context in comments.

4 Comments

Do you mean I should do : export PYSPARK_PYTHON=/usr/bin/python2 before running the script? I tried to SSH to the cluster and run manually spark-submit code.py and it seems to run with Python 2. But when I do it with --steps spark-submit ... it runs Python 3.
Hi, I mean whether you have added the PYSPARK_PYTHON environment variable into the $SPARK_HOME/conf/spark-env.sh file at your cluster nodes. The $SPARK_HOME is the directory where you installed Spark.
I just tried that and it still doesn't work. So basically when I call spark-submit from SSH it runs with Python2 but when I add a step 'spark-submit' with AWS console (or cli) it runs Python3.
actually when I run print(sys.version_info) via spark-submit (adding a step with AWS console) it says that it is Python 2.6.9, but there is "SyntaxError: invalid syntax" if I try to run 'print "hello world"'
0

Actually I had this in my code

from __future__ import print_function

and when I was running print 'hello world' it was crashing because it's not the default print function. But I thought it was crashing because it was using Python 3 instead of Python 2.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.