Table not found error while loading DataFrame into a Hive partition

Question

I am trying to insert data into Hive table like this:

val partfile = sc.textFile("partfile")
val partdata = partfile.map(p => p.split(","))
val partSchema = StructType(Array(StructField("id",IntegerType,true),StructField("name",StringType,true),StructField("salary",IntegerType,true),StructField("dept",StringType,true),StructField("location",StringType,true)))
val partRDD = partdata.map(p => Row(p(0).toInt,p(1),p(2).toInt,p(3),p(4)))
val partDF = sqlContext.createDataFrame(partRDD, partSchema)

Packages I imported:

import org.apache.spark.sql.Row
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType,StructField,StringType,IntegerType}
import org.apache.spark.sql.types._

This is how I tried to insert the dataframe into Hive partition:

partDF.write.mode(saveMode.Append).partitionBy("location").insertInto("parttab")

Im getting the below error even though I have the Hive Table:

org.apache.spark.sql.AnalysisException: Table not found: parttab;

Could anyone tell me what is the mistake I am doing here and how can I correct it ?

Raktotpal Bordoloi · Accepted Answer · 2017-06-23 05:50:29Z

1

To write data to Hive warehouse, you need to initialize hiveContext instance.

Upon doing that, it will take confs from Hive-Site.xml (from classpath); and connects to underlying Hive warehouse.

HiveContext is an extension to SQLContext to support and connect to hive.

To do so, try this::

val hc = new HiveContext(sc)

And perform your append-query onn this instance.

partDF.registerAsTempTable("temp")

hc.sql(".... <normal sql query to pick data from table `temp`; and insert in to Hive table > ....")

Please make sure that the table parttab is under db - default.

If the table in under another db, table name should be specified as : <db-name>.parttab

If you need to directly save the dataframe in to hive; use this:

df.saveAsTable("<db-name>.parttab")

edited Jun 23, 2017 at 5:50

answered Jun 23, 2017 at 5:30

Raktotpal Bordoloi

1,0578 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Metadata Over a year ago

could you tell where do you specify the dataframe here ?

Metadata Over a year ago

I tried it like this: scala> hc.sql("insert into parttab partition(location = 'India') select id,name,salary,dept,location from ptab"). getting error: Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@25ac587b, org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown Source) Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /home/cloudera/metastore_db.

Raktotpal Bordoloi Over a year ago

As you're running in spark shell, you shouldn't instantiate a HiveContext with instance name hc, there's one created automatically called sqlContext. (the name is misleading - if you compiled Spark with Hive, it will be a HiveContext). See similar discussion here: https://issues.apache.org/jira/browse/SPARK-9776.

Collectives™ on Stack Overflow

Table not found error while loading DataFrame into a Hive partition

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related