1

I have the following Scala Spark code in order to parse the fixed width txt file:

val schemaDf = df.select(
  df("value").substr(0, 6).cast("integer").alias("id"),
  df("value").substr(7, 6).alias("date"),
  df("value").substr(13, 29).alias("string")
)

I'd like to extract the following code:

  df("value").substr(0, 6).cast("integer").alias("id"),
  df("value").substr(7, 6).alias("date"),
  df("value").substr(13, 29).alias("string")

into the dynamic loop in order to be able to define the column parsing in some external configuration, something like this(where x will hold the config for each column parsing but for now this is simple numbers for demo purpose):

val x = List(1, 2, 3)
val df1 = df.select(
    x.foreach { 
        df("value").substr(0, 6).cast("integer").alias("id") 
    }
)

but right now the following line df("value").substr(0, 6).cast("integer").alias("id") don't compile with the following error:

type mismatch; found : org.apache.spark.sql.Column required: Int ⇒ ?

What am I doing wrong and how to properly return the dynamic Column list inside of df.select method?

1 Answer 1

2

The select won't take a statement as input, but you can save off the Columns you want to create and then expand the expression as input for the select:

val x = List(1, 2, 3)
val cols: List[Column] = x.map { i =>
  newRecordsDF("value").substr(0, 6).cast("integer").alias("id")
}
val df1 = df.select(cols: _*)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.