2

I'm parsing a JSON with Spark SQL and it works really well, it finds the schema and I'm doing queries with it.

Now I need to "flat" the JSON and I have read in the forum that the best way is to Explode with Hive (Lateral View), so I trying to do the same with it. But I can't even create the context... Spark gives me an error and I can't find how to fix it.

As I have said, at this point I'm only trying to create de context:

println ("Create Spark Context:")
val sc = new SparkContext( "local", "Simple", "$SPARK_HOME")
println ("Create Hive context:")
val hiveContext = new HiveContext(sc)

And it gives me this error:

Create Spark Context:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/12/26 15:13:44 INFO Remoting: Starting remoting
15/12/26 15:13:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:40624]

Create Hive context:
15/12/26 15:13:50 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/12/26 15:13:50 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/12/26 15:13:56 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:13:56 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:13:58 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:13:58 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:13:59 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
15/12/26 15:14:01 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/12/26 15:14:01 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
Exception in thread "main" java.lang.reflect.InvocationTargetException
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.liftedTree1$1(IsolatedClientLoader.scala:183)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.<init>(IsolatedClientLoader.scala:179)
  at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:226)
  at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:185)
  at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:392)
  at org.apache.spark.sql.hive.HiveContext.defaultOverrides(HiveContext.scala:174)
  at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:177)
  at pebd.emb.Bicing$.main(Bicing.scala:73)
  at pebd.emb.Bicing.main(Bicing.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:601)
  at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.OutOfMemoryError: PermGen space

Process finished with exit code 1

I know it is a very simple question, but I don't really know the reason of that error. Thank you in advance for everyone.

1 Answer 1

6

Here's the relevant part of the exception:

Caused by: java.lang.OutOfMemoryError: PermGen space

You need to increase the amount of PermGen memory that you give to the JVM. By default (SPARK-1879), Spark's own launch scripts increase this to 128 MB, so I think you'll have to do something similar in your IntelliJ run configuration. Try adding -XX:MaxPermSize=128m to the "VM options" list.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank Josh, I was looking in that directions and I found that post link. But although im doing exactly that (with 512M, even with 1024M) I get the same error. I have never had problems with SQLContext but this first time with HiveContext...
You should also consider switching to java 8 where permanent generation space is removed (see stackoverflow.com/questions/18339707/…)
I'm now trying with Java 8 and it seems to work. Thank thoredge and @Josh for your help!!
thanks, java 7 and 512m permgen size worked for my case

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.