1

I am trying to execute a local PySpark script on a Databricks cluster via dbx utility to test how passing arguments to python works in Databricks when developing locally. However, the test arguments I am passing are not being read for some reason. Could someone help? Following this guide, but it is a bit unclear and lacks good examples. https://dbx.readthedocs.io/en/latest/quickstart.html Found this, but it also not clear: How can I pass and than get the passed arguments in databricks job

Databricks manuals are very much not clear in this area.

My PySpark script:

import sys

n = len(sys.argv)
print("Total arguments passed:", n)

print("Script name", sys.argv[0])

print("\nArguments passed:", end=" ")
for i in range(1, n):
    print(sys.argv[i], end=" ")

dbx deployment.json:

{
  "default": {
    "jobs": [
      {
        "name": "parameter-test",
        "spark_python_task": {
            "python_file": "parameter-test.py"
        },
        "parameters": [
          "test-argument-1",
          "test-argument-2"
        ]
      }
    ]
  }
}

dbx execute command:

dbx execute\
  --cluster-id=<reducted>\
  --job=parameter-test\
  --deployment-file=conf/deployment.json\
  --no-rebuild\
  --no-package

Output:

(parameter-test) user@735 parameter-test % /bin/zsh /Users/user/g-drive/git/parameter-test/parameter-test.sh
[dbx][2022-07-26 10:34:33.864] Using profile provided from the project file
[dbx][2022-07-26 10:34:33.866] Found auth config from provider ProfileEnvConfigProvider, verifying it
[dbx][2022-07-26 10:34:33.866] Found auth config from provider ProfileEnvConfigProvider, verification successful
[dbx][2022-07-26 10:34:33.866] Profile DEFAULT will be used for deployment
[dbx][2022-07-26 10:34:35.897] Executing job: parameter-test in environment default on cluster None (id: 0513-204842-7b2r325u)
[dbx][2022-07-26 10:34:35.897] No rebuild will be done, please ensure that the package distribution is in dist folder
[dbx][2022-07-26 10:34:35.897] Using the provided deployment file conf/deployment.json
[dbx][2022-07-26 10:34:35.899] Preparing interactive cluster to accept jobs
[dbx][2022-07-26 10:34:35.997] Cluster is ready
[dbx][2022-07-26 10:34:35.998] Preparing execution context
[dbx][2022-07-26 10:34:36.534] Existing context is active, using it
[dbx][2022-07-26 10:34:36.992] Requirements file requirements.txt is not provided, following the execution without any additional packages
[dbx][2022-07-26 10:34:36.992] Package was disabled via --no-package, only the code from entrypoint will be used
[dbx][2022-07-26 10:34:37.161] Processing parameters
[dbx][2022-07-26 10:34:37.449] Processing parameters - done
[dbx][2022-07-26 10:34:37.449] Starting entrypoint file execution
[dbx][2022-07-26 10:34:37.767] Command successfully executed
Total arguments passed: 1
Script name python

Arguments passed:
[dbx][2022-07-26 10:34:37.768] Command execution finished
(parameter-test) user@735 parameter-test % 

Please help :)

2 Answers 2

2

It turns out the parameter section format of my deployment.json was not correct. Here is the corrected example:

{
  "default": {
    "jobs": [
      {
        "name": "parameter-test",
        "spark_python_task": {
          "python_file": "parameter-test.py",
          "parameters": [
            "test1",
            "test2"
          ]
        }
      }
    ]
  }
}

I've also posted my original question in Databricks forum: https://community.databricks.com/s/feed/0D58Y00008znXBxSAM?t=1659032862560 Hope it helps someone else.

Sign up to request clarification or add additional context in comments.

Comments

1

I also believe the "jobs" parameter is deprecated and you should use "workflows" instead.

Source: https://dbx.readthedocs.io/en/latest/migration/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.