Issues with Scala ScriptEngine inside spark submit application

Question

I am working on a system where I let users write DSLS and I load it as instances of my Type during runtime and these can be applied on top of RDDs. The entire application runs as a spark-submit application and I use ScriptEngine engine to compile DSLs written in Scala itself. Every tests works fine in SBT and IntelliJ. But while doing a spark-submit my own types available in my fat-jar is not available to import in Script. I initialize script engine as follows.

val engine: ScriptEngine = new ScriptEngineManager().getEngineByName("scala")
private val settings: Settings = engine.asInstanceOf[scala.tools.nsc.interpreter.IMain].settings
settings.usejavacp.value = true

settings.embeddedDefaults[DummyClass]
private val loader: ClassLoader = Thread.currentThread().getContextClassLoader
settings.embeddedDefaults(loader)

It seems like this is a problem with classloader during spark-submit. But I am not able to figure out the reason why my own types in my jar which also has the main program for spark-submit is unavailable in my script which is created in same JVM. scala scala-compiler,scala-reflect and scala-library versions are 2.11.8. Some help will be greatly appreciated.

I'm using the ScriptEngineManager with the same purpose than you. Basically, I want to interpret the DSL commands before initializing the SparkSession. After that, I get the classes that wrap the command functionality, and then I apply that to RDDs. My problem is that when I initialize the SparkSession I get this exception: java.lang.RuntimeException: class org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback not org.apache.hadoop.security.GroupMappingServiceProvider. Did you have the same exception? — Álvaro Valencia
– Álvaro Valencia, Commented Aug 8, 2018 at 9:31

paulsonvincent · Accepted Answer · 2018-08-29 04:50:33Z

1

I have found a working solution. By going through code and lot of debugging, I finally found out that ScriptEngine creates a Classloader for itself by consuming Classpath string of Classloader used to create it. In case of spark-submit, spark creates a special classloader which can read from both local and hdfs files. But classpath string obtained from this classloader will not have our application jars which is present in HDFS.

By manually appending my application jar to the ScriptEngine classpath before initialising it solved my problems. For this to work I had to locally download my application jar in HDFS to local before appending it.

answered Aug 29, 2018 at 4:50

paulsonvincent

493 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Daniel Darabos Over a year ago

Thanks! This helped, but it took me ages to figure out how to override the classpath. I've added my code in an answer.

Daniel Darabos · Accepted Answer · 2022-09-06 12:32:39Z

0

If you instantiate the Scala interpreter directly instead of via ScriptEngineManager, you can pass in settings and override the classpath:

val cl = java.lang.Thread.currentThread.getContextClassLoader
val jar = cl.asInstanceOf[java.net.URLClassLoader].getURLs.toList.head.toString
val settings = new scala.tools.nsc.Settings()
settings.classpath.value = jar
val engine = scala.tools.nsc.interpreter.Scripted(settings = settings)

answered Sep 6, 2022 at 12:32

Daniel Darabos

27.6k10 gold badges108 silver badges122 bronze badges

Collectives™ on Stack Overflow

Issues with Scala ScriptEngine inside spark submit application

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related