11

I am working on a project in Azure DataFactory, and I have a pipeline that runs a Databricks python script. This particular script, which is located in the Databricks file system and is run by the ADF pipeline, imports a module from another python script located in the same folder (both scripts are located in in dbfs:/FileStore/code).

The code below can import the python module into a Databricks notebook but doesn't work when is imported into a python script.

sys.path.insert(0,'dbfs:/FileStore/code/')
import conn_config as Connect

In the cluster logs, I get: Import Error: No module named conn_config

I guess that the problem is related to the inability of the python file of recognizing the Databricks environment. Any help?

2
  • You're right. This was an error. Commented May 31, 2021 at 10:33
  • That really took a while 😉 Well, thanks anyway 😊 PS: You should still go on the tour ... Commented May 31, 2021 at 10:38

4 Answers 4

14

I finally get it done with spark. Once the Spark Session is created (if your cluster has the spark session integrated there is no need to initiate a session):

spark.sparkContext.addPyFile("dbfs:/FileStore/code/conn_config.py")
import conn_config as C

This syntax can import a python module to a python script which is run from Azure DataFactory.

Sign up to request clarification or add additional context in comments.

1 Comment

This seems to work for me, but when I change the module, my host python code doesn't pick up the module changes.
2

You can't use path with dbfs: in it - Python doesn't know anything about this file system. You have two choices:

  1. Replace dbfs:/ with /dbfs/ (won't work on Community edition)
  2. Copy file(s) from DBFS to local file system with dbutils.fs.cp("dbfs:/FileStore/code", "file:/tmp/code", True), and refer to that local file name: /tmp/code

3 Comments

Is there any other option? I was thinking that may be is a way to refer to the python file my specific dbfs before giving the FileStore path.
you can upload files to any location on DBFS, or even have a separate ADLS container mounted to DBFS
@IspanCristi So does any of the given solutions work? If not, please change your question so that one can really see what you want to achieve and what requirements you have to comply with! Thanks.
-1

use %run relative_path/file_name then you can use the module right away, without import.

2 Comments

The issue with this is that it breaks the namespacing, this will quickly become problematic if you need to import multiple files. In my view it's a hacky workaround which you can use very carefully if you have to, but it's not what the question is asking for
Don't judge a thing so subjective. It really depends on your working environment, this works around method will help you if you are in an industry project which dbfs has been blocked for data privacy purpose like using immuta. In my case, above 2 method didn't work, but this solution helps me in my work, so I share it.
-3

You can just use references to filestores:

(0,'dbfs:/FileStore/code')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.