AWS EMR HiveQL - java.lang.OutOfMemoryError: Java heap space

Question

I am running a HiveQL job on AWS EMR and receive the following error (below in code block). The instance has 39 M3.2XLarge (m3.2xlarge 8vCPU 30GB Memory 2 x 80GB SSD Storage) nodes, with a total 1.1TB of memory.

The HiveQL file loads data from S3 creating a smaller main data table in ORC format. There are quite a few intermediate tables that execute properly before the error. The code block that errored out was a select count(distinct ...) from <main data table>

Is there a way to clean/clear out memory before each new statement? Do I need to adjust the size of the heap? What else can I provide to help give a better sense of the data and environment?

Error...

    Diagnostic Messages for this Task:
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:381)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
    at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:411)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:341)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)

What are your defaults for mapreduce.map.memory.mb and mapreduce.map.java.opts, also mapreduce.reduce.memory.mb and mapreduce.reduce.java.opts? You can adjust these from within your Hive script with "SET" commands, as long as you don't bump into YARN limits (e.g. yarn.scheduler.maximum-allocation-mb) — Samson Scharfrichter
– Samson Scharfrichter, Commented Dec 8, 2015 at 13:02
@SamsonScharfrichter This is what I found on the AWS EMR Documentation page. Configuration Option Default Value mapreduce.map.java.opts -Xmx1152m mapreduce.reduce.java.opts -Xmx2304m mapreduce.map.memory.mb 1440 mapreduce.reduce.memory.mb 2880 yarn.scheduler.minimum-allocation-mb 1440 yarn.scheduler.maximum-allocation-mb 23040 yarn.nodemanager.resource.memory-mb 23040 — user2715877
– user2715877, Commented Dec 8, 2015 at 15:06
@DurgaViswanathGadiraju The log file is 745 lines. Is there a point above the code I already pasted that would be more useful than the whole output? — user2715877
– user2715877, Commented Dec 8, 2015 at 15:21
@SamsonScharfrichter Here is additional data I found for the specific node we are using. Parameter Value YARN_RESOURCEMANAGER_HEAPSIZE 2703 YARN_PROXYSERVER_HEAPSIZE 2703 YARN_NODEMANAGER_HEAPSIZE 2048 HADOOP_JOB_HISTORYSERVER_HEAPSIZE 2703 HADOOP_NAMENODE_HEAPSIZE 3276 HADOOP_DATANODE_HEAPSIZE 1064 — user2715877
– user2715877, Commented Dec 8, 2015 at 15:29

user2715877 · Accepted Answer · 2015-12-08 23:30:05Z

1

The temporary answer has been to increase the "reducer" memory allocations...

SET mapreduce.reduce.memory.mb=6000; SET mapreduce.reduce.java.opts=-Xmx5000m;

edited Dec 8, 2015 at 23:30

answered Dec 8, 2015 at 23:21

user2715877

5431 gold badge12 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ChristopherB Over a year ago

There's nothing wrong with needing to increase memory. Evidently the job at this point is processing a set of intermediate map outputs that require more memory than the previous config allowed for a reducer.

Collectives™ on Stack Overflow

AWS EMR HiveQL - java.lang.OutOfMemoryError: Java heap space

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related