There is hive table with single column of type string.
hive> desc logical_control.test1; OK test_field_1 string test field 1
val df2 = spark.sql("select * from logical_control.test1")
df2.printSchema()
root |-- test_field_1: string (nullable = true)
df2.show(false)
+------------------------+ |test_field_1 | +------------------------+ |[[str0], [str1], [str2]]| +------------------------+
How to transform it to structured column like below?
root |-- A: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- S: string (nullable = true)
I tried to recover it with initial schema that data being structured before it was written to the hdfs. But json_data is null.
val schema = StructType(
Seq(
StructField("A", ArrayType(
StructType(
Seq(
StructField("S", StringType, nullable = true))
)
), nullable = true)
)
)
val df3 = df2.withColumn("json_data", from_json(col("test_field_1"), schema))
df3.printSchema()
root |-- test_field_1: string (nullable = true) |-- json_data: struct (nullable = true) | |-- A: array (nullable = true) | | |-- element: struct (containsNull = true) | | | |-- S: string (nullable = true)
df3.show(false)
+------------------------+---------+ |test_field_1 |json_data| +------------------------+---------+ |[[str0], [str1], [str2]]|null | +------------------------+---------+
desc formatted logical_control.test1;to the question?