HiveQL query performance optimization

Question

As the number of JOINS in Hive query is increasing, the query is running in multiple stages and taking a lot of execution time. How to improve the query performance. Are there any paramters to be set?

www · Accepted Answer · 2013-04-08 14:55:30Z

4

First of all large tables should be placed as last one in join order: SELECT small., large. FROM small JOIN large ON small.joinkey=large.joinkey; You can use a hint to tell optimazier which table is biggest:

SELECT/*+ STREAMTABLE(large) */ small.*, large.* FROM large
JOIN small ON small.joinkey=large.joinkey;

Second the small tables could be cached in memory on join by Map-side join:

set hive.auto.convert.join = true;
SELECT a.*, b.* FROM a
JOIN b ON a.joinkey=b.joinkey;

Size of map-join table is set by:

set hive.mapjoin.smalltable.filesize = 1000000;

I hope it helps a bit. GL!

answered Apr 8, 2013 at 14:55

www

4,4011 gold badge26 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user2637464 · Accepted Answer · 2013-08-08 10:33:59Z

0

In addition to the above when the query's SELECT or WHERE clauses does not reference the right table, always good to use left semi join.

The reason semi-joins are more efficient than the more general inner join is as follows. For a given record in the lefthand table, Hive can stop looking for matching records in the righthand table as soon as any match is found. At that point, the selected columns from the lefthand table record can be projected

answered Aug 8, 2013 at 10:33

user2637464

2,3561 gold badge15 silver badges5 bronze badges

Comments

zx485 · Accepted Answer · 2017-09-14 12:01:23Z

0

set hive.exec.parallel = True

this is general and using appropriate set commands we can optimize the query which is more considerable based on your cluster config.

edited Sep 14, 2017 at 12:01

zx485

29.1k28 gold badges55 silver badges65 bronze badges

answered Sep 14, 2017 at 11:43

selvasundarraj

213 bronze badges

Collectives™ on Stack Overflow

HiveQL query performance optimization

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related