1

I have a metrics.py which calculates a graph.

I can call it in the terminal command line (python ./metrics.py -i [input] [output]).

I want to write a function in Spark. It calls the metrics.py script to run on the provide file path and collects the values that metrics.py prints out.

How can I do that?

1 Answer 1

4

In order to run metrics.py, you essentially ship it to all the executor nodes that run your Spark Job.

To do this, you either pass it via SparkContext -

sc = SparkContext(conf=conf, pyFiles=['path_to_metrics.py'])

or pass it later using the Spark Context's addPyFile method -

sc.addPyFile('path_to_metrics.py')

In either case, after that, do not forget to import metrics.py and then just call needed function that gives needed output.

import metrics
metrics.relevant_function()

Also make sure you have all the python libraries that are imported inside metrics.py installed on all executor nodes. Else, take care of them using the --py-files and --jars handles while spark-submitting your job.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.