I have a pyspark dataframe:
Example df:
number | matricule<array> | name<array> |
----------------------------------------------
AA | [] | [7] |
----------------------------------------------
AA | [9] | [] |
----------------------------------------------
AA | [""] | [2] |
----------------------------------------------
AA | [2] | [""] |
I would like to change the arrays when they have the value string but is empty: [""] to []
I tried by:
df = df.withColumn("matricule_2", F.when(F.col("matricule") == F.lit("[""]"), F.lit("[]")).otherwise(F.col("matricule")))
But I got an error:
AnalysisException: u"cannot resolve, `matricule` = '[]')' due to data type mismatch: differing types.
Expected result:
number | matricule<array> | name<array> |
----------------------------------------------
AA | [] | [7] |
----------------------------------------------
AA | [9] | [] |
----------------------------------------------
AA | [] | [2] |
----------------------------------------------
AA | [2] | [] |
Please someone can help me please ? Thank you
array_removelike this :df = df.withColumn("matricule_2", array_remove(col("matricule"), ""))...