Hi I have a pyspark dataframe with an array col shown below.
I want to iterate through each element and fetch only string prior to hyphen and create another column.
+------------------------------+
|array_col |
+------------------------------+
|[hello-123, abc-111] |
|[hello-234, def-22, xyz-33] |
|[hiiii-111, def2-333, lmn-222]|
+------------------------------+
Desired Output;
+------------------------------+--------------------+
|col1 |new_column |
+------------------------------+--------------------+
|[hello-123, abc-111] |[hello, abc] |
|[hello-234, def-22, xyz-33] |[hello, def, xyz] |
|[hiiii-111, def2-333, lmn-222]|[hiiii, def2, lmn] |
+------------------------------+--------------------+
I am trying something like below but I could not apply a regex/substring inside a udf.
cust_udf = udf(lambda arr: [x for x in arr],ArrayType(StringType()))
df1.withColumn('new_column', cust_udf(col("col1")))
Can anyone please help on this. Thanks