I am trying, for some reason, to cast all the fields of a dataframe (with nested structTypes) to String.
I have already seen some solutions in StackOverflow (but they only work on simple dataframes without nested structs) (like here how to cast all columns of dataframe to string )
I'll explain what I really need via an example :
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import spark.implicits._
val rows1 = Seq(
Row(1, Row("a", "b"), 8.00, Row(1,2)),
Row(2, Row("c", "d"), 9.00, Row(3,4))
)
val rows1Rdd = spark.sparkContext.parallelize(rows1, 4)
val schema1 = StructType(
Seq(
StructField("id", IntegerType, true),
StructField("s1", StructType(
Seq(
StructField("x", StringType, true),
StructField("y", StringType, true)
)
), true),
StructField("d", DoubleType, true),
StructField("s2", StructType(
Seq(
StructField("u", IntegerType, true),
StructField("v", IntegerType, true)
)
), true)
)
)
val df1 = spark.createDataFrame(rows1Rdd, schema1)
println("Schema with nested struct")
df1.printSchema()
If we print the schema of the created dataframe, we have the following result :
root
|-- id: integer (nullable = true)
|-- s1: struct (nullable = true)
| |-- x: string (nullable = true)
| |-- y: string (nullable = true)
|-- d: double (nullable = true)
|-- s2: struct (nullable = true)
| |-- u: integer (nullable = true)
| |-- v: integer (nullable = true)
I tried to cast all the values to string as follows :
df1.select(df1.columns.map(c => col(c).cast(StringType)) : _*)
But it transforms the nested structTypes to string instead of casting each value of it to String:
root
|-- id: string (nullable = true)
|-- s1: string (nullable = true)
|-- d: string (nullable = true)
|-- s2: string (nullable = true)
Is there a simple solution which will help me to cast all the values to a StringType ? Here's the StructType That I want to have as a schema for my dataframe after the cast :
root
|-- id: string (nullable = true)
|-- s1: struct (nullable = true)
| |-- x: string (nullable = true)
| |-- y: string (nullable = true)
|-- d: string (nullable = true)
|-- s2: struct (nullable = true)
| |-- u: string (nullable = true)
| |-- v: string (nullable = true)
Thanks a lot !