What I'm trying to achieve here is sending to Spark SQL map function conditionally generated columns depending on if they have null, 0 or any other value I may want.
Take for example this initial DF.
val initialDF = Seq(
("a", "b", 1),
("a", "b", null),
("a", null, 0)
).toDF("field1", "field2", "field3")
From that initial DataFrame I want to generate yet another column which will be a map, like this.
initialDF.withColumn("thisMap", MY_FUNCTION)
My current approach to this is basically take a Seq[String] in a method a flatMap the key-value pairs that the Spark SQL method receives, like this.
def toMap(columns: String*): Column = {
map(
columns.flatMap(column => List(lit(column), col(column))): _*
)
}
But then, filtering becomes a Scala thing and is quite a mess.
What I would like to obtain after the processing would be, for each of those rows, the next DataFrame.
val initialDF = Seq(
("a", "b", 1, Map("field1" -> "a", "field2" -> "b", "field3" -> 1)),
("a", "b", null, Map("field1" -> "a", "field2" -> "b")),
("a", null, 0, Map("field1" -> "a"))
)
.toDF("field1", "field2", "field3", "thisMap")
I was wondering if this can be achieved using the Column API which is way more intuitive with .isNull or .equalTo?