12

I have a code which looks like below

 object ErrorTest {
case class APIResults(status:String, col_1:Long, col_2:Double, ...)

def funcA(rows:ArrayBuffer[Row])(implicit defaultFormats:DefaultFormats):ArrayBuffer[APIResults] = {
  //call some API ang get results and return APIResults
  ...
}

// MARK: load properties
val props = loadProperties()
private def loadProperties(): Properties =  {
  val configFile = new File("config.properties")
  val reader = new FileReader(configFile)
  val props = new Properties()
  props.load(reader)
  props
}

def main(args: Array[String]): Unit = {
  val prop_a = props.getProperty("prop_a")

  val session = Context.initialSparkSession();
  import session.implicits._

  val initialSet = ArrayBuffer.empty[Row]
  val addToSet = (s: ArrayBuffer[Row], v: Row) => (s += v)
  val mergePartitionSets = (p1: ArrayBuffer[Row], p2: ArrayBuffer[Row]) => (p1 ++= p2)

  val sql1 =
    s"""
       select * from tbl_a where ...
     """

  session.sql(sql1)
    .rdd.map{row => {implicit val formats = DefaultFormats; (row.getLong(6), row)}}
    .aggregateByKey(initialSet)(addToSet,mergePartitionSets)
    .repartition(40)
    .map{case (rowNumber,rows) => {implicit val formats = DefaultFormats; funcA(rows)}}
    .flatMap(x => x)
    .toDF()
    .write.mode(SaveMode.Overwrite).saveAsTable("tbl_b")
  }
}

when I run it via spark-submit, it throws error Caused by: java.lang.NoClassDefFoundError: Could not initialize class staging_jobs.ErrorTest$. But if I move val props = loadProperties() into the first line of main method, then there's no error anymore. Could anyone give me a explanation on this phenomenon? Thanks a lot!

Caused by: java.lang.NoClassDefFoundError: Could not initialize class staging_jobs.ErrorTest$
  at staging_jobs.ErrorTest$$anonfun$main$1.apply(ErrorTest.scala:208)
  at staging_jobs.ErrorTest$$anonfun$main$1.apply(ErrorTest.scala:208)
  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
  at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
  at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
  at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:243)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:190)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:188)
  at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:193)
  ... 8 more
1
  • 1
    I think this is because of Spark programming model. The code that executed on the Worker nodes, needs to be serializable Commented Jun 5, 2018 at 12:44

2 Answers 2

5

I've met the same question as you. I defined a method convert outside main method. When I use it with dataframe.rdd.map{x => convert(x)} in main , NoClassDefFoundError:Could not initialize class Test$ happened.

But when I use a function object convertor, which is the same code with convert method, in main method, no error happened.

I used spark 2.1.0, scala 2.11, it seems like a bug in spark?

Sign up to request clarification or add additional context in comments.

Comments

3

I guess the problem is that val props = loadProperties() defines a member for the outer class (of main). Then this member will be serialized (or run) on the executors, which do not have the save environment with the driver.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.