0

I want to initialize a dataframe where some rows have None/Null value in spark scala(version 3.2.1). How to do this ?

val df = spark.createDataFrame(
  Seq((0, "a", true), (1, "b", true), (2, "c", false), (3, "a", false), (4, "a", None), (5, "c", false))
).toDF("id", "category1", "category2")
df.show()

I get this error:

UnsupportedOperationException: Schema for type Any is not supported

2 Answers 2

2

That's because the nearest supertype of both Boolean and Option[Nothing] (None) is Any, and spark doesn't support that. The only thing you need to do to make your code work is to wrap the booleans inside Option/Some, so there's no need to define struct types, spark can figure it out. This would work:

Seq((0, "a", Some(true)), (1, "b", Some(true)), (2, "c", Some(false)), (3, "a", Some(false)), (4, "a", None), (5, "c", Some(false)))
  .toDF("id", "category1", "category2")
Sign up to request clarification or add additional context in comments.

Comments

1

I was able to achieve your required output using following code:

import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType, StructField, BooleanType};

val data = Seq(Row(true), Row(null))
val schema = List(StructField("boolColName", BooleanType, true))

val df = spark.createDataFrame(spark.sparkContext.parallelize(data), StructType(schema))
df.show()

The true supplied to schema specifies if the column is nullable

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.