I'm using spark with scala.
I have a Dataframe with 3 columns: ID,Time,RawHexdata. I have a user defined function which takes RawHexData and expands it into X more columns. It is important to state that for each row X is the same (the columns do not vary). However, before I receive the first data, I do not know what the columns are. But once I have the head, I can deduce it.
I would like a second Dataframe with said columns: Id,Time,RawHexData,NewCol1,...,NewCol3.
The "Easiest" method I can think of to do this is: 1. deserialize each row into json (every data tyoe is serializable here) 2. add my new columns, 3. deserialize a new dataframe from the altered json,
However, that seems like a waste, as it involves 2 costly and redundant json serialization steps. I am looking for a cleaner pattern.
Using case-classes, seems like a bad idea, because I don't know the number of columns, or the column names in advance.
RawHexdatamaybe..withColumn()function only after some conditions have been satisfied