1

I'm doing an importation of a sql database into an hive database on a hive client node (using the Hortonworks data platform) with the bash command :

$ hive -f tables.sql

I get the error :

log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.

Logging initialized using configuration in file:/etc/hive/2.6.1.0-129/0/hive-log4j.properties
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
    at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:429)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:718)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

I tried to increase the HADOOP_HEAPSIZE from 1GB to 4 GB but I still get the error. Any ideas ?

0

1 Answer 1

2

The OutOfMemoryError came from the Hive codebase in CliDriver#processReader(BufferedReader).

public int processReader(BufferedReader r) throws IOException {
  String line;
  StringBuilder qsb = new StringBuilder();

  while ((line = r.readLine()) != null) {
    // Skipping through comments
    if (! line.startsWith("--")) {
      qsb.append(line + "\n");
    }
  }

  return (processLine(qsb.toString()));
}

It is adding all of the lines read from the file to a StringBuilder and then executing it. This must mean that the input file you specified is very large. Is it possible to split it into multiple smaller files and execute each separately, so that memory footprint is reduced?

You mentioned this is an import of a SQL database. Apache Sqoop might be a better fit for that use case.

Sign up to request clarification or add additional context in comments.

2 Comments

I found this little repo for splitting .sql into table.sql : github.com/kedarvj/mysqldumpsplitter, i'm trying it
After some research Apache Scoop seems better suits for the job than my hive import command

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.