Why does Java OutOfMemoryError occurs when selecting less columns in hive query?

Question

I have two hive select statements:

select * from ode limit 5;

This successfully pulls out 5 records from the table 'ode'. All the columns are included in the result. However, This following query caused an error:

select content from ode limit 5;

Where 'content' is one column in the table. The error is:

hive> select content from ode  limit 5;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:207)

The second query should be a lot cheaper and why does it cause a memory issue? How to fix this?

GoodDok · Accepted Answer · 2020-07-10 11:02:16Z

0

When you select the whole table, Hive triggers Fetch task instead of MR that involves no parsing (it is like calling hdfs dfs -cat ... | head -5).

As far as I can see in your case, the hive client tries to run map locally.
You can choose one of the two ways:

You can find more details regarding fetch tasks here.

answered Jul 10, 2020 at 11:02

GoodDok

1,85016 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Why does Java OutOfMemoryError occurs when selecting less columns in hive query?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related