12

My goal is to import a custom .py file into my spark application and call some of the functions included inside that file

Here is what I tried:

I have a test file called Test.py which looks as follows:

def func():
    print "Import is working"

Inside my Spark application I do the following (as described in the docs):

sc = SparkContext(conf=conf, pyFiles=['/[AbsolutePathTo]/Test.py'])

I also tried this instead (after the Spark context is created):

sc.addFile("/[AbsolutePathTo]/Test.py")

I even tried the following when submitting my spark application:

./bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.10:1.5.0-M2 --py-files /[AbsolutePath]/Test.py ../Main/Code/app.py

However, I always get a name error:

NameError: name 'func' is not defined

when I am calling func() inside my app.py. (same error with 'Test' if I try to call Test.func())

Finally, al also tried importing the file inside the pyspark shell with the same command as above:

sc.addFile("/[AbsolutePathTo]/Test.py")

Strangely, I do not get an error on the import, but still, I cannot call func() without getting the error. Also, not sure if it matters, but I'm using spark locally on one machine.

I really tried everything I could think of, but still cannot get it to work. Probably I am missing something very simple. Any help would be appreciated.

7
  • does the absolute path contain any space? Are you importing in the app.py file? Commented Dec 21, 2015 at 15:10
  • nope, no spaces in the path. Yes, app.py is my spark application where I'm trying to do the import. But as I said, I have the same Issue if I'm trying to do an import inside a pyspark shell. Commented Dec 21, 2015 at 15:13
  • How are you importing it? Commented Dec 21, 2015 at 15:16
  • I'm not sure what you mean by "how", other than the 3 different approaches I tried and explained in the question? Commented Dec 21, 2015 at 15:18
  • I mean, in the file app.py, how do you import the file Test.py? Commented Dec 21, 2015 at 15:27

1 Answer 1

18

Alright, actually my question is rather stupid. After doing:

sc.addFile("/[AbsolutePathTo]/Test.py")

I still have to import the Test.py file like I would import a regular python file with:

import Test

then I can call

Test.func()

and it works. I thought that the "import Test" is not necessary since I add the file to the spark context, but apparently that does not have the same effect. Thanks mark91 for pointing me into the right direction.

UPDATE 28.10.2017:

as asked in the comments, here more details on the app.py

from pyspark import SparkContext
from pyspark.conf import SparkConf

conf = SparkConf()
conf.setMaster("local[4]")
conf.setAppName("Spark Stream")
sc = SparkContext(conf=conf)
sc.addFile("Test.py")

import Test

Test.func()
Sign up to request clarification or add additional context in comments.

1 Comment

I am looking for something similar to this. Can you please post the full code (app.py) , how you are importing and calling test.func() please?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.