There is a class attribute spark in our AnalyticsWriter class:
class AnalyticsWriter:
spark = SparkSession.getActiveSession() # this is not getting executed
I noticed that this code is not being executed before a certain class method is run. Note: it has been verified that there is already an active SparkSession available in the process: so the init code is simply not being executed
@classmethod
def measure_upsert(
cls
) -> DeltaTable:
assert AnalyticsWriter.spark, "AnalyticsWriter requires \
an active SparkSession"
I come from jvm-land (java/scala) and in those places the class level initialization code happens before any method invocations. What is the equivalent in python?
SparkSession.getActiveSession()would execute so early theAnalyticsWriterclass doesn't even exist yet. No class methods could possibly be executed first.AttributeError: type object 'AnalyticsWriter' has no attribute 'spark'.AnalyticsWriteris ever referenced. (2) Then theAnalyticsWriter.measure_upsert()is invoked by a client. At that point in JVM-land theclasslevel initlizations will occur including the line shown. Please elaborate on your comment in terms of that sequence.spark = SparkSession.getActiveSession()happens before (1), assuming the file containing the class definition is imported before (1). Python doesn't do Java-style class-loading. Classes aren't loaded on use. Aclassstatement is imperative, like an assignment statement, or a function call. If you had aprint(1)before the class definition and aprint(2)after it, the class body, includingspark = SparkSession.getActiveSession(), would execute between the twoprints.