Hive import fail [java.lang.OutOfMemoryError]

Question

I'm doing an importation of a sql database into an hive database on a hive client node (using the Hortonworks data platform) with the bash command :

$ hive -f tables.sql

I get the error :

log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.

Logging initialized using configuration in file:/etc/hive/2.6.1.0-129/0/hive-log4j.properties
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
    at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:429)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:718)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

I tried to increase the HADOOP_HEAPSIZE from 1GB to 4 GB but I still get the error. Any ideas ?

Chris Nauroth · Accepted Answer · 2017-07-20 16:55:38Z

2

The OutOfMemoryError came from the Hive codebase in CliDriver#processReader(BufferedReader).

public int processReader(BufferedReader r) throws IOException {
  String line;
  StringBuilder qsb = new StringBuilder();

  while ((line = r.readLine()) != null) {
    // Skipping through comments
    if (! line.startsWith("--")) {
      qsb.append(line + "\n");
    }
  }

  return (processLine(qsb.toString()));
}

It is adding all of the lines read from the file to a StringBuilder and then executing it. This must mean that the input file you specified is very large. Is it possible to split it into multiple smaller files and execute each separately, so that memory footprint is reduced?

You mentioned this is an import of a SQL database. Apache Sqoop might be a better fit for that use case.

edited Jul 20, 2017 at 16:55

answered Jul 20, 2017 at 5:54

Chris Nauroth

9,8841 gold badge37 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Adrien Forbu Over a year ago

I found this little repo for splitting .sql into table.sql : github.com/kedarvj/mysqldumpsplitter, i'm trying it

Adrien Forbu Over a year ago

After some research Apache Scoop seems better suits for the job than my hive import command

Collectives™ on Stack Overflow

Hive import fail [java.lang.OutOfMemoryError]

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related