Spark SQL throwing error "java.lang.UnsupportedOperationException: Unknown field type: void"

Question

I am getting below error in Spark(1.6) SQL while creating a table with column value default as NULL. Ex: create table test as select column_a, NULL as column_b from test_temp;

The same thing works in Hive and creates the column with data type "void".

I am using empty string instead of NULL to avoid the exception and new column getting string data type.

Is there any better way to insert null values in hive table using spark sql ?

2017-12-26 07:27:59 ERROR StandardImsLogger$:177 - org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Unknown field type: void
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:789)
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:746)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply$mcV$sp(ClientWrapper.scala:428)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply(ClientWrapper.scala:426)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply(ClientWrapper.scala:426)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:293)
    at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:239)
    at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:238)
    at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:281)
    at org.apache.spark.sql.hive.client.ClientWrapper.createTable(ClientWrapper.scala:426)
    at org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation$lzycompute$1(CreateTableAsSelect.scala:72)
    at org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation$1(CreateTableAsSelect.scala:47)
    at org.apache.spark.sql.hive.execution.CreateTableAsSelect.run(CreateTableAsSelect.scala:89)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
    at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:56)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:56)
    at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:153)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:829)

select cast(NULL as string) ?

philantrovert
– philantrovert

2017-12-27 06:46:03 +00:00
Commented Dec 27, 2017 at 6:46 — philantrovert
– philantrovert, Commented Dec 27, 2017 at 6:46
Thanks @philantrovert. It worked

Nagaraj Vittal
– Nagaraj Vittal

2017-12-27 07:34:42 +00:00
Commented Dec 27, 2017 at 7:34 — Nagaraj Vittal
– Nagaraj Vittal, Commented Dec 27, 2017 at 7:34

philantrovert · Accepted Answer · 2017-12-27 10:27:26Z

2

I couldn't find much information regarding the datatype void but it looks like it is somewhat equivalent to the Any datatype we have in Scala.

The table at the end of this page explains that a void can be cast to any other data type.

Here are some JIRA issues that are kinda similar to the problem you are facing

So, as mentioned in the comment, instead of NULL you can cast it to any of the implicit data types.

select cast(NULL as string) as column_b

answered Dec 27, 2017 at 10:27

philantrovert

10.1k3 gold badges42 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Jørgen Guldmann · Accepted Answer · 2022-03-22 08:45:39Z

0

I started to get a similar issue. I build the code down to an example

WITH DATA
AS (
  SELECT 1 ISSUE_ID,
         DATE(NULL) DueDate,
         MAKE_DATE(2000,01,01) DDate
  UNION ALL
  SELECT 1 ISSUE_ID,
         MAKE_DATE(2000,01,01),
         MAKE_DATE(2000,01,02)
)
SELECT ISNOTNULL(lag(IT.DueDate, 1) OVER (PARTITION by IT.ISSUE_ID ORDER BY IT.DDate ))
       AND ISNULL(IT.DueDate)
FROM DATA IT

answered Mar 22, 2022 at 8:45

Jørgen Guldmann

1

1 Comment

Algamest Over a year ago

Hi Jørgen! Welcome :) As I understand you've created a minimal, workable example (mwe), which is awesome! Since it isn't an answer to the original question, I would suggest making this a comment, or better yet, contributing this to the question itself by proposing an edit.

Collectives™ on Stack Overflow

Spark SQL throwing error "java.lang.UnsupportedOperationException: Unknown field type: void"

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related