2

Given this, seem have done this in the past ok, but...:

val arrayStructData2 = Seq(
      Row("James", 2),
      Row("Alex", 3)
    )

 val arrayStructSchema2 = new StructType()
                            .add("names",new StructType()
                                 .add("name", StringType)
                                 .add("extraField", IntegerType)
                                )

val df = spark.createDataFrame(spark.sparkContext.parallelize(arrayStructData2),arrayStructSchema2)
df.printSchema()
df.show()

I get this:

...
Caused by: RuntimeException: java.lang.String is not a valid external type for schema of struct<name:string,extraField:int>

Can't see it immediately.

2 Answers 2

1

For others, as a reminder, needed Row(Row... as in:

val arrayStructData2 = Seq(
      Row(Row("James", 2)),
      Row(Row("Alex", 3))
    )

Not so obvious error imho.

Sign up to request clarification or add additional context in comments.

Comments

1

When you create the DataFrame with createDataFrame you register the schema, but nothing is actually evaluated which is why df.printSchema works as expected. When you execute df.show the DataFrame is evaluated and Spark tries to load the first value you have given it (in this case a String) into a struct which results in the runtimeException you're seeing. Here is the scaladoc for Spark 3.1.1:

Creates a DataFrame from a java.util.List containing Rows using the given schema. It is important to make sure that the structure of every Row of the provided List matches the provided schema. Otherwise, there will be runtime exception.

It's telling you that you are trying to force a string into a struct.

1 Comment

See my answer..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.