1

I am running a HiveQL job on AWS EMR and receive the following error (below in code block). The instance has 39 M3.2XLarge (m3.2xlarge 8vCPU 30GB Memory 2 x 80GB SSD Storage) nodes, with a total 1.1TB of memory.

The HiveQL file loads data from S3 creating a smaller main data table in ORC format. There are quite a few intermediate tables that execute properly before the error. The code block that errored out was a select count(distinct ...) from <main data table>

Is there a way to clean/clear out memory before each new statement? Do I need to adjust the size of the heap? What else can I provide to help give a better sense of the data and environment?

Error...

    Diagnostic Messages for this Task:
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:381)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
    at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:411)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:341)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165) 
7
  • Can you paste the entire log of your hive statement? Commented Dec 8, 2015 at 3:44
  • What are your defaults for mapreduce.map.memory.mb and mapreduce.map.java.opts, also mapreduce.reduce.memory.mb and mapreduce.reduce.java.opts? You can adjust these from within your Hive script with "SET" commands, as long as you don't bump into YARN limits (e.g. yarn.scheduler.maximum-allocation-mb) Commented Dec 8, 2015 at 13:02
  • @SamsonScharfrichter This is what I found on the AWS EMR Documentation page. Configuration Option Default Value mapreduce.map.java.opts -Xmx1152m mapreduce.reduce.java.opts -Xmx2304m mapreduce.map.memory.mb 1440 mapreduce.reduce.memory.mb 2880 yarn.scheduler.minimum-allocation-mb 1440 yarn.scheduler.maximum-allocation-mb 23040 yarn.nodemanager.resource.memory-mb 23040 Commented Dec 8, 2015 at 15:06
  • @DurgaViswanathGadiraju The log file is 745 lines. Is there a point above the code I already pasted that would be more useful than the whole output? Commented Dec 8, 2015 at 15:21
  • @SamsonScharfrichter Here is additional data I found for the specific node we are using. Parameter Value YARN_RESOURCEMANAGER_HEAPSIZE 2703 YARN_PROXYSERVER_HEAPSIZE 2703 YARN_NODEMANAGER_HEAPSIZE 2048 HADOOP_JOB_HISTORYSERVER_HEAPSIZE 2703 HADOOP_NAMENODE_HEAPSIZE 3276 HADOOP_DATANODE_HEAPSIZE 1064 Commented Dec 8, 2015 at 15:29

1 Answer 1

1

The temporary answer has been to increase the "reducer" memory allocations...

SET mapreduce.reduce.memory.mb=6000; SET mapreduce.reduce.java.opts=-Xmx5000m;

Sign up to request clarification or add additional context in comments.

1 Comment

There's nothing wrong with needing to increase memory. Evidently the job at this point is processing a set of intermediate map outputs that require more memory than the previous config allowed for a reducer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.