I'm loading a CSV file using Spark's csv loader and transforming it into a specific Dataset by providing a case class schema and by using .as[T].
spark.read
.option("header", "false")
.option("dateFormat", "yyyy-MM-dd HH:mm:ss.SSS")
.schema(schemaOf[T])
.csv(filePath)
.as[T]
My question here is, I have more than one system sending the same file and
say if one system is sending a file containing less than the two columns from my defined schema
then I would like to just put null for those two columns and load all the other columns.
And for all the other systems, load all the fields when sent conforming to the schema.
How do I do this in an efficient way? I dont want to create case class for each system.
case classorschemais with 25 columns then it can come with 23 columns (22 comma).