spark-submit with scala package ++ operator returning java.lang.NoSuchMethodError: scala.Predef$.refArrayOps

Question

I'm facing a weird issue when trying to run my scala spark app with spark-submit (it's working fine when doing sbt run). All of this is ran locally.

I have a standard sparkSession declaration:

  val spark: SparkSession = SparkSession
    .builder()
    .master("local[*]")
    .appName("EPGSubtitleTimeSeries")
    .getOrCreate()

but when trying to run it through spark-submit as follow:

./bin/spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.3 --master local[2] --class com.package.EPGSubtitleTimeSeries --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem /home/jay/project/tv-data-pipeline/target/scala-2.12/epg-subtitles_2.12-0.1.jar

I got this error:

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
    at com.project.Environment$.<init>(EPGSubtitleTimeSeries.scala:55)
    at com.project.Environment$.<clinit>(EPGSubtitleTimeSeries.scala)
    at com.project.EPGSubtitleJoined$.$anonfun$start_incremental_load$1(EPGSubtitleTimeSeries.scala:409)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.immutable.Set$Set3.foreach(Set.scala:163)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
    at scala.collection.SetLike$class.map(SetLike.scala:92)
    at scala.collection.AbstractSet.map(Set.scala:47)
    at com.package.EPGSubtitleJoined$.start_incremental_load(EPGSubtitleTimeSeries.scala:408)
    at com.package.EPGSubtitleTimeSeries$.main(EPGSubtitleTimeSeries.scala:506)
    at com.package.EPGSubtitleTimeSeries.main(EPGSubtitleTimeSeries.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Which I have narrow down with a few print to make sure that it's actually this line producing it:

val EPG_OUTPUT_COLUMNS: Array[String] = EPG_SCHEDULE_OUTPUT_COLUMNS ++ Array("subtitle_channel_title", "epg_channel_title", "channelTitle")

From:

val EPG_SCHEDULE_OUTPUT_COLUMNS = Array(
    "program_title",
    "epg_titles",
    "series_title",
    "season_title",
    "date_time",
    "duration",
    "short",
    "medium",
    "long",
    "start_timestamp",
    "end_timestamp",
    "epg_year_month",
    "epg_day_of_month",
    "epg_hour_of_day",
    "epg_genre",
    "channelId"
  )

  val EPG_OUTPUT_COLUMNS: Array[String] = EPG_SCHEDULE_OUTPUT_COLUMNS ++ Array("subtitle_channel_title", "epg_channel_title", "channelTitle")

I'm using spark 2.4.4 and scala 2.12.8 as well as joda-time 2.10.1 (no other dependencies on my build.sbt)

Does anyone has an idea of what the error is?

Are you sure that you are using the same versions for Spark & Scala on compile and runtime? — Luis Miguel Mejía Suárez
– Luis Miguel Mejía Suárez, Commented Sep 25, 2019 at 14:56
I'm doing all of this through command line, how can I make sure of this? — Jay Cee
– Jay Cee, Commented Sep 25, 2019 at 14:57
How was the cluster created? If on premise ask your system administrator which versions do they use. If on AWS EMR, check the service version and look the documentation which version of the packages they provide, etc. Also, if you have access to the cluster where the app is running, open an spark-shell it will print the Spark & Scala versions. You should always use the same exact versions. — Luis Miguel Mejía Suárez
– Luis Miguel Mejía Suárez, Commented Sep 25, 2019 at 15:00
@LuisMiguelMejíaSuárez ah, I should have precised, before running it on AWS I'm trying to do it locally at the moment, still with spark-submit — Jay Cee
– Jay Cee, Commented Sep 25, 2019 at 15:03
and are your sure that the version that you installed locally is the same that you used for compiling? — Luis Miguel Mejía Suárez
– Luis Miguel Mejía Suárez, Commented Sep 25, 2019 at 15:10

Jay Cee · Accepted Answer · 2019-09-25 15:39:27Z

2

Following my conversation with Luis it appears that I compiled with scala 2.12 while spark was running on scala 2.11

I first wanted to upgrade to spark 2.4.4 (which would allow me to use 2.12 I think?) but the main problem is that aws-emr (which is my final goal) doesn't support scala 2.12: https://forums.aws.amazon.com/thread.jspa?messageID=902385&tstart=0

So the final solution for it was to downgrade my scala version to 2.11 at compilation.

Thanks a lot Luis for your guidance and knowledge!

answered Sep 25, 2019 at 15:39

Jay Cee

1,9755 gold badges31 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

spark-submit with scala package ++ operator returning java.lang.NoSuchMethodError: scala.Predef$.refArrayOps

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related