3

I'm facing a weird issue when trying to run my scala spark app with spark-submit (it's working fine when doing sbt run). All of this is ran locally.

I have a standard sparkSession declaration:

  val spark: SparkSession = SparkSession
    .builder()
    .master("local[*]")
    .appName("EPGSubtitleTimeSeries")
    .getOrCreate()

but when trying to run it through spark-submit as follow:

./bin/spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.3 --master local[2] --class com.package.EPGSubtitleTimeSeries --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem /home/jay/project/tv-data-pipeline/target/scala-2.12/epg-subtitles_2.12-0.1.jar

I got this error:

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
    at com.project.Environment$.<init>(EPGSubtitleTimeSeries.scala:55)
    at com.project.Environment$.<clinit>(EPGSubtitleTimeSeries.scala)
    at com.project.EPGSubtitleJoined$.$anonfun$start_incremental_load$1(EPGSubtitleTimeSeries.scala:409)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.immutable.Set$Set3.foreach(Set.scala:163)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
    at scala.collection.SetLike$class.map(SetLike.scala:92)
    at scala.collection.AbstractSet.map(Set.scala:47)
    at com.package.EPGSubtitleJoined$.start_incremental_load(EPGSubtitleTimeSeries.scala:408)
    at com.package.EPGSubtitleTimeSeries$.main(EPGSubtitleTimeSeries.scala:506)
    at com.package.EPGSubtitleTimeSeries.main(EPGSubtitleTimeSeries.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Which I have narrow down with a few print to make sure that it's actually this line producing it:

val EPG_OUTPUT_COLUMNS: Array[String] = EPG_SCHEDULE_OUTPUT_COLUMNS ++ Array("subtitle_channel_title", "epg_channel_title", "channelTitle")

From:

val EPG_SCHEDULE_OUTPUT_COLUMNS = Array(
    "program_title",
    "epg_titles",
    "series_title",
    "season_title",
    "date_time",
    "duration",
    "short",
    "medium",
    "long",
    "start_timestamp",
    "end_timestamp",
    "epg_year_month",
    "epg_day_of_month",
    "epg_hour_of_day",
    "epg_genre",
    "channelId"
  )

  val EPG_OUTPUT_COLUMNS: Array[String] = EPG_SCHEDULE_OUTPUT_COLUMNS ++ Array("subtitle_channel_title", "epg_channel_title", "channelTitle")

I'm using spark 2.4.4 and scala 2.12.8 as well as joda-time 2.10.1 (no other dependencies on my build.sbt)

Does anyone has an idea of what the error is?

8
  • Are you sure that you are using the same versions for Spark & Scala on compile and runtime? Commented Sep 25, 2019 at 14:56
  • I'm doing all of this through command line, how can I make sure of this? Commented Sep 25, 2019 at 14:57
  • How was the cluster created? If on premise ask your system administrator which versions do they use. If on AWS EMR, check the service version and look the documentation which version of the packages they provide, etc. Also, if you have access to the cluster where the app is running, open an spark-shell it will print the Spark & Scala versions. You should always use the same exact versions. Commented Sep 25, 2019 at 15:00
  • @LuisMiguelMejíaSuárez ah, I should have precised, before running it on AWS I'm trying to do it locally at the moment, still with spark-submit Commented Sep 25, 2019 at 15:03
  • 1
    and are your sure that the version that you installed locally is the same that you used for compiling? Commented Sep 25, 2019 at 15:10

1 Answer 1

2

Following my conversation with Luis it appears that I compiled with scala 2.12 while spark was running on scala 2.11

I first wanted to upgrade to spark 2.4.4 (which would allow me to use 2.12 I think?) but the main problem is that aws-emr (which is my final goal) doesn't support scala 2.12: https://forums.aws.amazon.com/thread.jspa?messageID=902385&tstart=0

So the final solution for it was to downgrade my scala version to 2.11 at compilation.

Thanks a lot Luis for your guidance and knowledge!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.