I use spark(2.4) with scala. I have a dataframe and I am trying to replace null values (of my array columns) by défault values (empty array).
val emptyStringArray = udf(() => Array.empty[String],
DataTypes.createArrayType(DataTypes.StringType, false))
def ensureNonNullCol: DataFrame => DataFrame = inputDf => {
inputDf.select(inputDf.schema.fields.map { f: StructField =>
f.dataType match {
case array: ArrayType => new Column(
AssertNotNull(when(col(f.name).isNull,
array.elementType match {
case DataTypes.StringType => emptyStringArray()
}).otherwise(col(f.name)).expr)
).as(f.name)
}
}: _*)
}
At the end, i get :
|-- StrAarrayColumn: array (nullable = false)
| |-- element: string (containsNull = true)
How can I have :
|-- StrAarrayColumn: array (nullable = false)
| |-- element: string (containsNull = false)
?