Scala Apache Spark and dynamic column list inside of DataFrame select method

Question

I have the following Scala Spark code in order to parse the fixed width txt file:

val schemaDf = df.select(
  df("value").substr(0, 6).cast("integer").alias("id"),
  df("value").substr(7, 6).alias("date"),
  df("value").substr(13, 29).alias("string")
)

I'd like to extract the following code:

  df("value").substr(0, 6).cast("integer").alias("id"),
  df("value").substr(7, 6).alias("date"),
  df("value").substr(13, 29).alias("string")

into the dynamic loop in order to be able to define the column parsing in some external configuration, something like this(where x will hold the config for each column parsing but for now this is simple numbers for demo purpose):

val x = List(1, 2, 3)
val df1 = df.select(
    x.foreach { 
        df("value").substr(0, 6).cast("integer").alias("id") 
    }
)

but right now the following line df("value").substr(0, 6).cast("integer").alias("id") don't compile with the following error:

type mismatch; found : org.apache.spark.sql.Column required: Int ⇒ ?

What am I doing wrong and how to properly return the dynamic Column list inside of df.select method?

Dan W · Accepted Answer · 2018-12-05 16:17:19Z

2

The select won't take a statement as input, but you can save off the Columns you want to create and then expand the expression as input for the select:

val x = List(1, 2, 3)
val cols: List[Column] = x.map { i =>
  newRecordsDF("value").substr(0, 6).cast("integer").alias("id")
}
val df1 = df.select(cols: _*)

answered Dec 5, 2018 at 16:17

Dan W

5,7824 gold badges36 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scala Apache Spark and dynamic column list inside of DataFrame select method

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related