0

I am getting below error in Spark(1.6) SQL while creating a table with column value default as NULL. Ex: create table test as select column_a, NULL as column_b from test_temp;

The same thing works in Hive and creates the column with data type "void".

I am using empty string instead of NULL to avoid the exception and new column getting string data type.

Is there any better way to insert null values in hive table using spark sql ?

2017-12-26 07:27:59 ERROR StandardImsLogger$:177 - org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.UnsupportedOperationException: Unknown field type: void
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:789)
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:746)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply$mcV$sp(ClientWrapper.scala:428)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply(ClientWrapper.scala:426)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$createTable$1.apply(ClientWrapper.scala:426)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:293)
    at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:239)
    at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:238)
    at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:281)
    at org.apache.spark.sql.hive.client.ClientWrapper.createTable(ClientWrapper.scala:426)
    at org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation$lzycompute$1(CreateTableAsSelect.scala:72)
    at org.apache.spark.sql.hive.execution.CreateTableAsSelect.metastoreRelation$1(CreateTableAsSelect.scala:47)
    at org.apache.spark.sql.hive.execution.CreateTableAsSelect.run(CreateTableAsSelect.scala:89)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
    at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:56)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:56)
    at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:153)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:829)
2
  • 2
    select cast(NULL as string) ? Commented Dec 27, 2017 at 6:46
  • Thanks @philantrovert. It worked Commented Dec 27, 2017 at 7:34

2 Answers 2

2

I couldn't find much information regarding the datatype void but it looks like it is somewhat equivalent to the Any datatype we have in Scala.

The table at the end of this page explains that a void can be cast to any other data type.

Here are some JIRA issues that are kinda similar to the problem you are facing

So, as mentioned in the comment, instead of NULL you can cast it to any of the implicit data types.

select cast(NULL as string) as column_b
Sign up to request clarification or add additional context in comments.

Comments

0

I started to get a similar issue. I build the code down to an example

WITH DATA
AS (
  SELECT 1 ISSUE_ID,
         DATE(NULL) DueDate,
         MAKE_DATE(2000,01,01) DDate
  UNION ALL
  SELECT 1 ISSUE_ID,
         MAKE_DATE(2000,01,01),
         MAKE_DATE(2000,01,02)
)
SELECT ISNOTNULL(lag(IT.DueDate, 1) OVER (PARTITION by IT.ISSUE_ID ORDER BY IT.DDate ))
       AND ISNULL(IT.DueDate)
FROM DATA IT

1 Comment

Hi Jørgen! Welcome :) As I understand you've created a minimal, workable example (mwe), which is awesome! Since it isn't an answer to the original question, I would suggest making this a comment, or better yet, contributing this to the question itself by proposing an edit.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.