I have dictionary with information like,
dict_segs = {'key1' : {'a' : {'col1' : 'value1', 'col2' : 'value2', 'col3': 'value3'},
'b' : {'col2' : 'value2', 'col3' : 'value3'},
'c' : {'col1' : 'value1'}},
'key2' : {'d' : {'col3' : 'value3', 'col2' : 'value2'},
'f' : {'col1' : 'value1', 'col4' : 'value4'}}}
TO DO :
keys are basically 'segments', for which the underlying dictionaries i.e. a, b, c for key1 are 'subsegments'. For every subsegment the filter condition is available in underlying dictionaries for subsegments i.e. a, b, c, d, f. Also, the filter condition for subsegments dictionary keys are also the column names of pyspark dataframe.
I want to create subsegment columns in pyspark dataframe at one go for each segment, and values for each subsegment column when meets the filter condition will be 1, else 0, something like,
for item in dict_segs:
pyspark_dataframe.withColumn(*dict_segs[item].keys(), when(meeting filter criteria with respect to each key), 1).otherwise(0))
On doing research i was able to find something similar in scala, but the column filtering condition there is static, but for above logic i.e. dynamic. Please see below scala logic,
Spark/Scala repeated calls to withColumn() using the same function on multiple columns
Need support to derive above logic for each segment as per pseudo code above.
Thanks.