1

I have two hive select statements:

select * from ode limit 5;

This successfully pulls out 5 records from the table 'ode'. All the columns are included in the result. However, This following query caused an error:

select content from ode limit 5;

Where 'content' is one column in the table. The error is:

hive> select content from ode  limit 5;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:207)

The second query should be a lot cheaper and why does it cause a memory issue? How to fix this?

1 Answer 1

0

When you select the whole table, Hive triggers Fetch task instead of MR that involves no parsing (it is like calling hdfs dfs -cat ... | head -5).

As far as I can see in your case, the hive client tries to run map locally.
You can choose one of the two ways:

  1. Force remote execution with hive.fetch.task.conversion
  2. Increase hive client heap size using HADOOP_CLIENT_OPTS env variable.

You can find more details regarding fetch tasks here.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.