1

I have n arrays of string columns. I would like concatenate this n columns in one, using a loop.

I have this function to concat columns:

def concat(type):
    def concat_(*args):
        return list(chain(*args))
    return udf(concat_, ArrayType(type))

concat_string_arrays = concat(StringType())

And in the following example, I have 4 columns that I will concatenate like this:

df_aux = df.select('ID_col',concat_string_arrays(col("patron_txt_1"),col("patron_txt_2"),col('patron_txt_3'),col('patron_txt_0')).alias('patron_txt')

But, if I have 200 columns, how can I use dynamically this function with a loop?

1
  • 2
    Can you give us a snippet how your data looks like? And how the result should look in the end. Commented Jan 17, 2018 at 9:50

1 Answer 1

1

You can use the * operator to pass a list of columns to your concat UDF:

from itertools import chain
from pyspark.sql.functions import col, udf
from pyspark.sql.types import *

df = sqlContext.createDataFrame([("1", "2","3","4"), 
                                 ("5","6","7","8")], 
                                 ('ID_col', 'patron_txt_0','patron_txt_1','patron_txt_2'))  

def concat(type):
    def concat_(*args):
        return list(chain(*args))
    return udf(concat_, ArrayType(type))


concat_string_arrays = concat(StringType())

#Select the columns you want to concatenate 
cols = [c for c in df.columns if c.startswith("patron_txt")]

#Use the * operator to pass multiple columns to concat_string_arrays
df.select('ID_col',concat_string_arrays(*cols).alias('patron_txt')).show()

This results in the following output:

+------+----------+
|ID_col|patron_txt|
+------+----------+
|     1| [2, 3, 4]|
|     5| [6, 7, 8]|
+------+----------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.