8

I am trying to establish an AWS lambda function which calls a databricks notebook (in the event of an s3 trigger).I understand I have to use the Jobs API of databricks in my lambda function(python) code to make a POST request using the JSON payload of the runs-submit function.

Although the documentation is not very clear, I was able to call a test script and on checking the response text I see the databricks login page html code which means it is not getting authenticated .

I did read on user tokens but I am not sure how to even incorporate them for authentication.

Any help of making this work in other ways or helping me use the user_tokens to get authenticated so that the flow reaches the execution of the notebook rather than getting stopped at authentication page would be helpful.

Thanks in advance.

Code Sample:

import requests
import json

job_payload = {
  "run_name": 'just_a_run',
  "existing_cluster_id": '****',
  "notebook_task": 
    {
      "notebook_path": 'https://databricks.cloud.company.com/****'
    }
}

resp = requests.post('https://databricks.cloud.company.com/2.0/jobs/runs/submit', json=job_payload)
print(resp.status_code)
print(resp.text)

200


<!DOCTYPE html>

<html>
<head>
    <meta charset="utf-8"/>
    <meta http-equiv="Content-Language" content="en"/>
    <title>Databricks - Sign In</title>
    <meta name="viewport" content="width=960">
    <link rel="stylesheet" href="/login/bootstrap.min.css">
    <link rel="icon" type="image/png" href="login/favicon.ico" />

    <meta http-equiv="content-type" content="text/html; charset=UTF8">
<link rel="shortcut icon" href="favicon.ico"><link href="login/login.e555bb48.css" rel="stylesheet"></head>
<body>
<div id="login-page"></div>
<script type="text/javascript" src="login/login.dabd48fd.js"></script></body>
</html>

1 Answer 1

12

SOLVED:

1) You will need to create a user token for authorization and send it as 'headers' parameter while performing the REST request.

2) headers={'Authorization': 'Bearer token'} In place of token must be your actual token that you get from databricks.

3) The api link must start with /api

4) Path to the databricks notebook must be absolute path i.e. "/Users/$USER_NAME/book_name"

Final Working Code:

import requests
import json

job_payload = {
  "run_name": 'just_a_run',
  "existing_cluster_id": 'id_of_cluster',
  "notebook_task": 
    {
      "notebook_path": '/Users/username/notebook_name'
    }
}

resp = requests.post('https://databricks.cloud.company.com/api/2.0/jobs/runs/submit', json=job_payload, headers={'Authorization': 'Bearer token'})

print(resp.status_code)

print(resp.text)
Sign up to request clarification or add additional context in comments.

3 Comments

I'm triggering the job using run_now API and I'm getting response 200 but job is not triggering. I tired it with my other account and it worked for that using username and password for authentication.
This solution works partially. It will create a new job, but won't run it. Subsequent run job -call is required with the appropriate job ID.
Need some adjustment to your code, like token part and changed databricks.cloud.company.com to our databricks server. Then it works! Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.