I am having the following dataframe(values inside array are strings):
+--------------------+--------------------+
| col1| col2|
+--------------------+--------------------+
| [value1, value2]| [value3,value4]|
| [value5]| [value6]|
+--------------------+--------------------+
How can I create an new column with a new array including all the values of both
+--------------------+--------------------+------------------------------+
| col1| col2| new |
+--------------------+--------------------+------------------------------+
| [value1, value2]| [value3,value4]|[value1, value2,value3,value4]|
| [value5]| [value6]| [value5,value6]|
+--------------------+--------------------+------------------------------+
I tried the following:
def add_function(col1,col2):
return col1+col2
udf_add = udf(add_function,ArrayType(StringType()))
dftrial.withColumn("new",udf_add("col1","col2")).show(2)
It does do the task as desired. But I dont understand why when I modify the add_function to:
def add_function(col1,col2):
return col1.extend(col2)
It returns null value. Why?
And my main question: Is there another way to implement this task, Any already implemented function? I found concat but it seems that it works only for strings.