I am loading mongodb to hive table and trying to solve Unsupported NullType when saveAsTable. Sample data schema
root
|-- level1: struct (nullable = true)
| |-- level2: struct (nullable = true)
| | |-- level3_1: null (nullable = true)
| | |-- level3_2: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- level4: null (nullable = true)
I tried functions.lit like
df = df.withColumn("level1.level2.level3_1", functions.lit("null").cast("string"));
.withColumn("level1.level2.level3_2.level4", functions.lit("null").cast("string"));
but the result is like
root
|-- level1: struct (nullable = true)
| |-- level2: struct (nullable = true)
| | |-- level3_1: null (nullable = true)
| | |-- level3_2: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- level4: null (nullable = true)
|-- level1.level2.level3_1: string (nullable = false)
|-- level1.level2.level3_2.level4: string (nullable = false)
I also checked df.na().fill() but this seems not changing the schema.
The desired result is
root
|-- level1: struct (nullable = true)
| |-- level2: struct (nullable = true)
| | |-- level3_1: string (nullable = true)
| | |-- level3_2: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- level4: string (nullable = true)
and that I can use loaded mongodb data save as table to hive
Does anyone have worked on this and could give me some advise that how to cast nested nulltype or how to deal with nulltype in java. Think of a systematic/general solution that can scale for more complex data. Many thanks