0

Is it possible to declare a local variable before it's imported?

For example to get this code to run as expected:

# a.py

# do magic here to make b.foo = "bar"
import b

b.printlocal()
# b.py
# imagine this file cannot be changed (e.g., for dumb political reasons)

local_var = foo

def printlocal():
  print(local_var)

Running a.py should print "bar". How can this be accomplished without changing b.py?

What I've Tried so Far

A. Patching

from unittest.mock import patch

with patch("b.foo"):
  import b

  b.printlocal()

Result: NameError: name 'foo' is not defined

Thoughts: I think this doesn't work because the module is patched after the import completes.

B. Namespace manipulation

# a.py
import sys

module_name = "b"
module_globals = sys.modules[module_name].__dict__
module_globals["foo"] = "bar"

import b

b.printlocal()

Result: KeyError: 'b'

Thoughts: This fails because the module cannot be modified before it's available in sys.modules. I haven't tried this directly but I think if the module was manually created with the locals it wouldn't work either as a module wouldn't import a second time.

C. Using locals import

# a.py

import importlib
b = importlib.__import__(
    "b",
    locals={"__builtins__": __builtins__, "foo": "bar"},
    # I also tried using globals here with same error
)
b.printlocal()

Result: NameError: name 'foo' is not defined

Thought: I think this fails because import globals and locals are only used for importing the file. Documentation on this function is sparse.

Motivation: Why attempt this "unpythonic" monstrosity?

Databricks notebooks have spark and dbutils variables at the top level so there are hundreds (just in my repos) of python files that cannot be imported making unit testing difficult. I would love to be able to initialize these as MagicMocks while importing to make testing possible and to avoid side-effects.

In the example above foo is a stand-in for spark and dbutils.b.py is a notebook which contains some spark code with side effects and also functions which should be tested. printlocal() is a stand-in for a function to be tested. Here's a quick example:

# Databricks notebook source
from pyspark.sql.functions import to_timestamp, concat

curr_date = dbutils.widgets.get("CurrentDate") # will fail here if imported directly "dbutils not defined"
env = dbutils.widgets.get("Environment")

def dateStringToTimestampExpression(date_str): # want to unittest this in a.py
  return to_timestamp(concat(col(date_str), "yyyy-MM-dd")

df = spark.table.read(f"{env}.table_a")
df = df.withColumn("updated_on", dateStringToTimestampExpression(curr_date))
df.write.saveAsTable("table_a_enhanced") # undesirable side-effect
3
  • I think it would be better to re-focus this question around your last section regarding databricks, possibly with some example code you would write there, and how you would like the interface for testing to look. The question as you have it is very open ended. It's not likely that all solutions to this generic problem will be applicable to the actual use-case you have. Commented Dec 4, 2024 at 21:50
  • @flakes good idea, I added a sample databricks notebook with a function to be tested. I wanted to keep the simple example to keep the focus on the abstract problem which may come up for non-databricks situations Commented Dec 4, 2024 at 22:13
  • Perhaps for testing these methods, it might be easier to copy and slightly modify the source before you test it. You could add in an import at the start of each file to fetch the dependencies into the local scope... something like this as the first line to each copied file "import mocking; spark = mocking.spark(); dbutils = mocking.dbutils()". Now when you go to mock these dependencies, you only need to mock the return values for these functions, allowing you to share the same mocking setup for every test, without needing to do patches for each unit individually. Commented Dec 4, 2024 at 23:40

1 Answer 1

0
# a.py

__builtins__.foo = "bar"

import b

b.printlocal()

Result: It prints "bar".

Note: In 99% of cases it's a bad idea to clutter the namespace by adding things to builtins. Testing is probably ok. I would have to be tortured before approving a PR which clutters builtins in production code using this pattern.

Usage in Databricks

__builtins__.spark = MagicMock()
__builtins__.dbutils = MagicMock()


import notebook_file

# then run tests from a column expression variable or function from the notebook
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.