I have a pyspark dataframe with a column Categories that has a list of values, for a given ID. Looks like this, let's call this df1:
ID | Categories
122 [16777277, 16777350, 16777483, 16777425, 16777303]
There are about 500 categories total, but we can see in the example above that ID 122 only has the above 5 categories. Let's call this df2.columns
I want transform each value in the list from the Categories column into its own column, like this (df3):
ID | 16777277 | 16777350 | 16777483 | 16777425 | 44444 | 55555 | .......
122 1 1 1 1 0 0
Any idea how I can make this transformation?
I tried using explode() then pivot() however, this seemed to incorrectly map the list values to columns.
What I tried:
df1 = df1.withColumn('exploded', F.explode('Categories'))
pivoted = df1.groupby('ID').pivot('exploded').count()
Thanks in advance!