I have the following Scala Spark code in order to parse the fixed width txt file:
val schemaDf = df.select(
df("value").substr(0, 6).cast("integer").alias("id"),
df("value").substr(7, 6).alias("date"),
df("value").substr(13, 29).alias("string")
)
I'd like to extract the following code:
df("value").substr(0, 6).cast("integer").alias("id"),
df("value").substr(7, 6).alias("date"),
df("value").substr(13, 29).alias("string")
into the dynamic loop in order to be able to define the column parsing in some external configuration, something like this(where x will hold the config for each column parsing but for now this is simple numbers for demo purpose):
val x = List(1, 2, 3)
val df1 = df.select(
x.foreach {
df("value").substr(0, 6).cast("integer").alias("id")
}
)
but right now the following line df("value").substr(0, 6).cast("integer").alias("id") don't compile with the following error:
type mismatch; found : org.apache.spark.sql.Column required: Int ⇒ ?
What am I doing wrong and how to properly return the dynamic Column list inside of df.select method?