3

I have two python files - my_python_A.py and my_python_B.py. The first file references the second (from my_python_B import *).

I'm executing the first python file from a shell action in Oozie (i.e. the script is simply python my_python_A.py), and am receiving the following error:

Traceback (most recent call last):
  File "my_python_A.py", line 2, in <module>
    from my_python_B import *
ImportError: No module named my_python_B
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

Both python files are located under the same directory in HDFS. How can I get this import statement to work?

7
  • usually the interpreter looks first in the current directory, then in your shell, and then in your default PATH. Which environment are you using ? And, are these files part of the same project? If they are, perhaps you are better off using an __init__ file Commented Apr 25, 2016 at 18:20
  • @jmugz3 - not sure this is as relevant when running on a cluster, since all of the files are stored on a distributed system (i.e. HDFS). Commented Apr 25, 2016 at 18:23
  • Gotcha. I'm not familiar with Oozie, but it sounds like your interpreter is not recognizing your module, so you could try to add your working directory to your shell path. Commented Apr 25, 2016 at 18:29
  • Check this anwer Commented Apr 25, 2016 at 18:30
  • you could try something like import sys sys.path.append("/Users/path/to/file") Commented Apr 25, 2016 at 19:09

2 Answers 2

7

I faced the same issue and the way I worked around this problem was by setting the environment variable PYTHONPATH to the current working directory inside the shell script before I execute my python code

export PYTHONPATH=`pwd`
python m_python_A.py

Make sure that in your shell action you have included all the required python modules inside the <file></file> tags. Assuming that you have a shell script called sample_script.sh (inside which you have the aforementioned commands) your workflow.xml file should look something like this

<workflow-app name="shellTest" xmlns="uri:oozie:workflow:0.4">
    <start to="shell-action"/>
    <action name="shell-action">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>                
                <property>
                    <name>oozie.launcher.mapred.job.queue.name</name>
                    <value>${launcherqueue}</value>
                </property>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${mapredqueue}</value>
                </property>
            </configuration>
            <exec>sample_script.sh</exec>
            <file>${appPath}/sample_script.sh#sample_script.sh</file>
            <file>${appPath}/m_python_A.py#m_python_A.py</file>
            <file>${appPath}/m_python_B.py#m_python_B.py</file>
            <capture-output/>
        </shell>
        <ok to="end"/>
        <error to="shell-action-failed"/>
    </action>

    <kill name="shell-action-failed">
        <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>

    <end name="end" />

</workflow-app>
Sign up to request clarification or add additional context in comments.

2 Comments

What is ${appPath} in your example?
${appPath} is the path to the folder on hdfs inside which your scripts lie.
1

What about to add

sys.path.append(os.path.join(os.path.dirname(__file__), "lib"))

in your m_python_A.py to access anything which is stored into (I.E.) lib/ ?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.