4

Is it possible to submit/configure a spark python script (.py) file to databricks job?

I have my developments happening in my Pycharm IDE, then push/commit the code to our gitlab repository. My requirement is I need to create new jobs in databricks cluster as and when a python script is moved to a GitLab master branch.

I would like to get some suggestions if its possible to create a databricks job on a python script, using gitlab.yml scripts?

In databricks Job UI, I could see spark jar or a notebook that can be used, but wondering if we can provide a python file.

Thanks,

Yuva

1 Answer 1

5

This functionality is not currently available in the Databricks UI, but it is accessible via the REST API. You'll want to use the SparkPythonTask data structure.

You'll find this example in the official documentation:

curl -n -H "Content-Type: application/json" -X POST -d @- https://<databricks-instance>/api/2.0/jobs/create <<JSON
{
  "name": "SparkPi Python job",
  "new_cluster": {
    "spark_version": "5.2.x-scala2.11",
    "node_type_id": "i3.xlarge",
    "num_workers": 2
  },
  "spark_python_task": {
    "python_file": "dbfs:/docs/pi.py",
    "parameters": [
      "10"
    ]
  }
}JSON

If you need help getting started with the REST API, see here.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Raphael, can it be done from gitlab by any chance? Just curious

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.