1

I'm trying to run a simple python script on Oozie using Hue. I'm using anaconda parcels installed so I've also add in Cloudera manager, spark configuration (Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh)

if [ -z "${PYSPARK_PYTHON}" ]; then
export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python
fi

When running the job, i've a python error ImportError: No module named pandas.io.json , meaning that the PYSPARK_PYTHON doesn't seems to take the one from anaconda.

I've tried to add an arguments with

PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python

on the spark action via hue, but doesn't seems to work.

If I run the scripts via CLI and spark-submit it works. If I run other python scripts on Oozie via Hue (without packages from anaconda) it works.

What am I missing ? :/

1 Answer 1

4

When using spark via Oozie you need to tell what environment variables should be set on launcher container (the one that starts spark session).

Try adding a new property of spark action with key oozie.launcher.mapreduce.map.env and value PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda/bin/python and it should work as expected.

Sign up to request clarification or add additional context in comments.

4 Comments

Hi Can I get the property for oozie spark action such that spark job is submitted as "user" not as "YARN"
This feature is called "impersonation" and as far as I know it is not configurable for action but for whole oozie servers configuration.
you save my day !
Nit: "mapred" properties are deprecated since Hadoop V2, and may be ignored in V3 => oozie.launcher.mapreduce.map.env

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.