I have a pySpark dataframe with many attributes in columns (there is about 160). These columns are 1s and 0s to show whether an account has an attribute or not. I need to do an analysis about the combinations of attributes, so I want to put together a sting in a new column with the names of the attributes that, that account has. Here is an example: I have these columns - account, then some other columns, then the attributes. The column I want to add is 'att_list'.
What I have tried is something like this:
I have the list of attributes in a variable
# create a list of all the attributes available
att_names=df1.drop('Account','other_col1','other_col1')
attlist=[x for x in att_names.columns ]
I tried with a function - expanding an existing :
def func_att_list(df, cols=[]):
att_list_column = ','.join([when(f.col(i) > 0, i) for i in cols])
return df.withColumn('att_list', att_list_column )
df2 = func_att_list(df1, cols=[i for i in attlist])
This just errors out.
I've also tried this:
att_list_column = [when(df1.col(i) > 0, i) for i in attlist]
df1 = df1.withColumn('att_list', ','.join([i for i in att_list_column ])
This also doesnt work.
I am not confident with functions and find them a bit of a 'black box'. I would greatly appreciate any help.

F.concatinstead of joinf.concat_ws, which gives a "Column is not iterable" error.