1

I am having hive table with following structure

CREATE TABLE gcganamrswp_work.historical_trend_result( column_name string,
metric_name string,
current_percentage string,
lower_threshold double,
upper_threshold double,
calc_status string,
final_status string,
support_override string,
dataset_name string,
insert_timestamp string,
appid string,
currentdate string,
indicator map<string,string>)
PARTITIONED BY (
appname string,
year_month int)
STORED AS PARQUET TBLPROPERTIES ("parquet.compression"="SNAPPY");

I am having spark dataframe with schema

root
 |-- metric_name: string (nullable = true)
 |-- column_name: string (nullable = true)
 |-- Lower_Threshold: double (nullable = true)
 |-- Upper_Threshold: double (nullable = true)
 |-- Current_Percentage: double (nullable = true)
 |-- Calc_Status: string (nullable = false)
 |-- Final_Status: string (nullable = false)
 |-- support_override: string (nullable = false)
 |-- Dataset_Name: string (nullable = false)
 |-- insert_timestamp: string (nullable = false)
 |-- appId: string (nullable = false)
 |-- currentDate: string (nullable = false)
 |-- indicator: map (nullable = false)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = false)
 |-- appname: string (nullable = false)
 |-- year_month: string (nullable = false)

when i try to insert into hive table using below code it is failing

    spark.conf.set("hive.exec.dynamic.partition", "true")
    spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict")
    data_df.repartition(1)
      .write.mode("append")
      .format("hive")
      .insertInto(Outputhive_table)

Spark Version : Spark 2.4.0

Error:

ERROR Hive:1987 - Exception when loading partition with parameters partPath=hdfs://gcgprod/data/work/hive/historical_trend_result/.hive-staging_hive_2021-09-01_04-34-04_254_8783620706620422928-1/-ext-10000/_temporary/0, table=historical_trend_result, partSpec={appname=, year_month=}, replace=false, listBucketingEnabled=false, isAcid=false, hasFollowingStatsTask=false org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Partition spec is incorrect. {appname=, year_month=}) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1662) at org.apache.hadoop.hive.ql.metadata.Hive.lambda$loadDynamicPartitions$4(Hive.java:1970) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: MetaException(message:Partition spec is incorrect. {appname=, year_month=}) at org.apache.hadoop.hive.metastore.Warehouse.makePartName(Warehouse.java:329) at org.apache.hadoop.hive.metastore.Warehouse.makePartPath(Warehouse.java:312) at org.apache.hadoop.hive.ql.metadata.Hive.genPartPathFromTable(Hive.java:1751) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartitionInternal(Hive.java:1607)

I have specified the partition columns in the last columns of dataframe, so i expect it consider last tow columns as partition columns. I wanted to used the same routine for inserting different tables so i don't want to mention the partition columns explicitly

1
  • maybe because your schema says year_mont int, but dataframe says year_month string? Commented Sep 1, 2021 at 19:38

2 Answers 2

2

Just to recap that you are using spark to write data to a hive table with dynamic partitions. So my answer below is based on same, if my understanding is incorrect, please feel free to correct me in comment.

While you have a table that is dynamically partitioned (by app_name and year_month), the spark job doesn't know the partitioning fields in the destination so you will still have to tell your spark job about the partitioning field of the destination table.

Something like this should work

data_df.repartition(1)
      .write
      .partitionBy("appname", "year_month")
      .mode(SaveMode.Append)
      .saveAsTable(Outputhive_table)

Make sure that you enable support for dynamic partitions by executing something like

hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

Check out this post by Itai Yaffe, this may be handy https://medium.com/nmc-techblog/spark-dynamic-partition-inserts-part-1-5b66a145974f

Sign up to request clarification or add additional context in comments.

9 Comments

I tried this code. I am getting error message - partitionBy() can't be used together with DataFrameWriter.insertInto
What is the version of spark that you are running ?
@Arvinth, i have modified my answer for older version of spark, this should work
I am getting error after trying updated code: org.apache.spark.sql.AnalysisException: partition column appname,year_month is not defined in table gcganamrswp_work.historical_trend_result, defined table columns are: column_name, metric_name, current_percentage, lower_threshold, upper_threshold, calc_status, final_status, support_override, dataset_name, insert_timestamp, appid, currentdate, indicator, appname, year_month;
@arvinth you are writing to the wrong table. From your question above, the variable Outputhive_table should be set to gcgcrsnapsd_equifax_t_db.historical_trend_result while from the error message it seems to be set to gcganamrswp_work.historical_trend_result
|
1

I think the problem is that some records have appname and year_month as strings. At least this is suggested by

Partition spec is incorrect. {appname=, year_month=}

Make sure partition colums are never empty or null! Also note that the type of year_month is not consistent between the DataFrame and your schema (string/int)

1 Comment

I checked the partition columns are not empty or null. Also, i changed the datatype of year_month to int. But still getting same error. Please help

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.