3

I am working on a system where I let users write DSLS and I load it as instances of my Type during runtime and these can be applied on top of RDDs. The entire application runs as a spark-submit application and I use ScriptEngine engine to compile DSLs written in Scala itself. Every tests works fine in SBT and IntelliJ. But while doing a spark-submit my own types available in my fat-jar is not available to import in Script. I initialize script engine as follows.

val engine: ScriptEngine = new ScriptEngineManager().getEngineByName("scala")
private val settings: Settings = engine.asInstanceOf[scala.tools.nsc.interpreter.IMain].settings
settings.usejavacp.value = true

settings.embeddedDefaults[DummyClass]
private val loader: ClassLoader = Thread.currentThread().getContextClassLoader
settings.embeddedDefaults(loader)

It seems like this is a problem with classloader during spark-submit. But I am not able to figure out the reason why my own types in my jar which also has the main program for spark-submit is unavailable in my script which is created in same JVM. scala scala-compiler,scala-reflect and scala-library versions are 2.11.8. Some help will be greatly appreciated.

2
  • I'm using the ScriptEngineManager with the same purpose than you. Basically, I want to interpret the DSL commands before initializing the SparkSession. After that, I get the classes that wrap the command functionality, and then I apply that to RDDs. My problem is that when I initialize the SparkSession I get this exception: java.lang.RuntimeException: class org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback not org.apache.hadoop.security.GroupMappingServiceProvider. Did you have the same exception? Commented Aug 8, 2018 at 9:31
  • Are you using spark-submit to launch your application? Commented Aug 29, 2018 at 4:46

2 Answers 2

1

I have found a working solution. By going through code and lot of debugging, I finally found out that ScriptEngine creates a Classloader for itself by consuming Classpath string of Classloader used to create it. In case of spark-submit, spark creates a special classloader which can read from both local and hdfs files. But classpath string obtained from this classloader will not have our application jars which is present in HDFS.

By manually appending my application jar to the ScriptEngine classpath before initialising it solved my problems. For this to work I had to locally download my application jar in HDFS to local before appending it.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! This helped, but it took me ages to figure out how to override the classpath. I've added my code in an answer.
0

If you instantiate the Scala interpreter directly instead of via ScriptEngineManager, you can pass in settings and override the classpath:

val cl = java.lang.Thread.currentThread.getContextClassLoader
val jar = cl.asInstanceOf[java.net.URLClassLoader].getURLs.toList.head.toString
val settings = new scala.tools.nsc.Settings()
settings.classpath.value = jar
val engine = scala.tools.nsc.interpreter.Scripted(settings = settings)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.