4

I was able to insert data into a Hive table from my spark code using HiveContext like below

   val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
   sqlContext.sql("CREATE TABLE IF NOT EXISTS e360_models.employee(id INT, name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'")

   sqlContext.sql("insert into table e360_models.employee select t.* from (select 1210, 'rahul', 55) t")
   sqlContext.sql("insert into table e360_models.employee select t.* from (select 1211, 'sriram pv', 35) t")
   sqlContext.sql("insert into table e360_models.employee select t.* from (select 1212, 'gowri', 59) t")

   val result = sqlContext.sql("FROM e360_models.employee SELECT id, name, age")
   result.show()

But, this approach is creating a separate file in the warehouse for every insertion like below

part-00000
part-00000_copy_1
part-00000_copy_2
part-00000_copy_3

Is there any way to avoid this and just append the new data to a single file or is there any other better way to insert data into hive from spark?

1
  • Did you manage to solve this problem? I did a little bit of research and no luck! Commented Apr 20, 2017 at 15:22

1 Answer 1

1

No, there is no way to do that. Each new insert will create a new file. It's not a Spark "issue", but a general behavior you can experience with Hive too. The only way is to perform a single insert with the UNION of all your data, but if you need to do multiple inserts, you'll have multiple files.

The only thing you can do is to enable file merging in hive (look at it here: Hive Create Multi small files for each insert in HDFS and https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties).

Sign up to request clarification or add additional context in comments.

2 Comments

Are these individual files going to occupy 64mb(default block size) * 3(replication factor) or just the actual size of the files * replication factor?
I think you can find an answer here: stackoverflow.com/questions/13012924/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.