DF1 - flat dataframe with data
+---------+--------+-------+
|FirstName|LastName| Device|
+---------+--------+-------+
| Robert|Williams|android|
| Maria|Sharpova| iphone|
+---------+--------+-------+
root
|-- FirstName: string (nullable = true)
|-- LastName: string (nullable = true)
|-- Device: string (nullable = true)
DF2 - empty dataframe with same column names
+------+----+
|header|body|
+------+----+
+------+----+
root
|-- header: struct (nullable = true)
| |-- FirstName: string (nullable = true)
| |-- LastName: string (nullable = true)
|-- body: struct (nullable = true)
| |-- Device: string (nullable = true)
DF2 schema Code:
val schema = StructType(Array(
StructField("header", StructType(Array(
StructField("FirstName", StringType),
StructField("LastName", StringType)))),
StructField("body", StructType(Array(
StructField("Device", StringType))))
))
DF2 with data from DF1 would be the final output.
Need to do this for multiple columns for a complex schema and make it configurable. Have to do this without using case class.
APPROACH #1 - use schema.fields.map to map DF1 -> DF2?
APPROACH #2 - create a new DF and define data and schema?
APPROACH #3 - use zip and map transformations to define 'select col as col' query.. don't know if this would work for nested (structtype) schema
How would I go on about doing that?
structas nicely shown below by @mvasyliv