3

There is a class attribute spark in our AnalyticsWriter class:

class AnalyticsWriter:

    spark = SparkSession.getActiveSession()  # this is not getting executed

I noticed that this code is not being executed before a certain class method is run. Note: it has been verified that there is already an active SparkSession available in the process: so the init code is simply not being executed

    @classmethod
    def measure_upsert(
        cls
    ) -> DeltaTable:

        assert AnalyticsWriter.spark, "AnalyticsWriter requires \
             an active SparkSession"

I come from jvm-land (java/scala) and in those places the class level initialization code happens before any method invocations. What is the equivalent in python?

8
  • Is it possible that the "certain class method" is static..? Commented Dec 6, 2022 at 16:57
  • 1
    "I noticed that this code is not being executed before a certain class method is run" - you've misdiagnosed the problem. In Python, SparkSession.getActiveSession() would execute so early the AnalyticsWriter class doesn't even exist yet. No class methods could possibly be executed first. Commented Dec 6, 2022 at 16:57
  • If the initialization really hadn't happened, you wouldn't be getting an assertion failure. You'd be getting an AttributeError: type object 'AnalyticsWriter' has no attribute 'spark'. Commented Dec 6, 2022 at 17:00
  • @user2357112 Please clarify your comment in terms of the sequencing. (1) The SparkSession is being initialized in a different class before AnalyticsWriter is ever referenced. (2) Then the AnalyticsWriter.measure_upsert() is invoked by a client. At that point in JVM-land the class level initlizations will occur including the line shown. Please elaborate on your comment in terms of that sequence. Commented Dec 6, 2022 at 17:00
  • @WestCoastProjects: spark = SparkSession.getActiveSession() happens before (1), assuming the file containing the class definition is imported before (1). Python doesn't do Java-style class-loading. Classes aren't loaded on use. A class statement is imperative, like an assignment statement, or a function call. If you had a print(1) before the class definition and a print(2) after it, the class body, including spark = SparkSession.getActiveSession(), would execute between the two prints. Commented Dec 6, 2022 at 17:03

1 Answer 1

5

Class attributes are initialized at the moment they are hit, during class definition, so the line containing the getActiveSession() call is run before the class is even fully defined.

class AnalyticsWriter:
    spark = SparkSession.getActiveSession()
    # The code has been run here
    
    # ... other definitions that occur after spark exists ...
# class is complete here

I suspect the code is doing something, just not what you expect. You can confirm that it is in fact run with a cheesy hack like:

class AnalyticsWriter:
    spark = (SparkSession.getActiveSession(), print("getActiveSession called", flush=True))[0]

which just makes a tuple of the result of your call and an eager print, then discards the meaningless result from the print; you should see the output from the print immediately, before you can get around to calling class methods.

Sign up to request clarification or add additional context in comments.

2 Comments

I thought that i had put a breakpoint on that line already, but wil double check (and use the printing). I do use tuples for printing things at times since few python constructs support chaining and/or side effects
oh it does stop there. my bad. I'll award when i'm allowed (2 mins tick tock)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.