Concat dataframe columns by passing list

Question

    from pyspark.sql import Row, functions as F
    row = Row("UK_1","UK_2","Date","Cat")
    df = (sc.parallelize
    ([
        row(1,1,'12/10/2016',"A"),
        row(1,2,None,'A'),
        row(2,1,'14/10/2016','B'),
        row(3,3,'!~2016/2/276','B'),
        row(None,1,'26/09/2016','A'),
        row(1,1,'12/10/2016',"A"),
        row(1,2,None,'A'),
        row(2,1,'14/10/2016','B'),
        row(None,None,'!~2016/2/276','B'),
        row(None,1,'26/09/2016','A')
        ]).toDF())

       pks = ["UK_1","UK_2"]

      df1 = (
      df
      .select(columns) 
       #.withColumn('pk',F.concat(pks))
      .withColumn('pk',F.concat("UK_1","UK_2"))
      )

   df1.show()

Is there a way I can pass in a list of columns into the concat? I want to use the code for scenarios where the columns can be varying and i would like to pass it as a list.

akuiper · Accepted Answer · 2017-09-21 18:24:03Z

5

Yes, the syntax is *args (variable number of arguments) in python:

df.withColumn("pk", F.concat(*pks)).show()

+----+----+------------+---+----+
|UK_1|UK_2|        Date|Cat|  pk|
+----+----+------------+---+----+
|   1|   1|  12/10/2016|  A|  11|
|   1|   2|        null|  A|  12|
|   2|   1|  14/10/2016|  B|  21|  
|   3|   3|!~2016/2/276|  B|  33|
|null|   1|  26/09/2016|  A|null|
|   1|   1|  12/10/2016|  A|  11|
|   1|   2|        null|  A|  12|
|   2|   1|  14/10/2016|  B|  21|
|null|null|!~2016/2/276|  B|null|
|null|   1|  26/09/2016|  A|null|
+----+----+------------+---+----+

answered Sep 21, 2017 at 18:24

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Tronald Dump Over a year ago

I am getting the following error ,AnalysisException: u'cannot resolve \'"UK_1"\' given input columns: [UK_1, UK_2, Date, Cat];'

akuiper Over a year ago

It seems you have some extra quotes around column names. You can check pks, and make sure the strings don't have unnecessary quotes.

Collectives™ on Stack Overflow

Concat dataframe columns by passing list

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related