1

I am trying to create an assembly jar executable file But getting the following error

Caused by: java.lang.ClassNotFoundException: csv.DefaultSource

The problem is with the CSV file read. The code is working fine in the IDE. Please help me

Scala code is below

package extendedtable

import org.apache.log4j.{Level, Logger}
import org.apache.spark.SparkContext
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
import scala.collection.mutable.ListBuffer

object mainObject {

 // var read = new fileRead
  def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkSession.builder().appName("generationobj").master("local[*]").config("spark.sql.crossJoin.enabled", value = true).getOrCreate()
    val sc: SparkContext = spark.sparkContext
    import spark.implicits._

    val atomData = spark.read.format("csv")
      .option("header", "true")
      .option("inferSchema", "true")
      .load("Resources/atom.csv")

    val moleculeData = spark.read.format("csv")
      .option("header", "true")
      .option("inferSchema", "true")
      .load("Resources/molecule.csv")

    val df = moleculeData.join(atomData,"molecule_id")
    val molecule_df = moleculeData
    val mid: List[Row] = molecule_df.select("molecule_id").collect.toList
    var listofmoleculeid: List[String] = mid.map(r => r.getString(0))
    // print(listofmoleculeid)
    newDF.createTempView("table")
    newDF.show()}

Following is the build File

name := "ExtendedTable"

version := "0.1"

scalaVersion := "2.11.12"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.3.0"

mainClass := Some("extendedtable.mainObject")

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs @ _*) => MergeStrategy.discard
  case x => MergeStrategy.first
}
2
  • Can you post build file ?? Make sure use same spark version Commented Jun 6, 2020 at 13:25
  • @Srinivas I have posted the build can you please look at that Commented Jun 6, 2020 at 13:44

2 Answers 2

6

Change your assemblyMergeStrategy like below & then build jar file.

You need to include this org.apache.spark.sql.sources.DataSourceRegister file inside your jar file & this file will be available inside spark-sql jar file.

Path is - spark-sql_2.11-<version>.jar /META-INF/services/org.apache.spark.sql.sources.DataSourceRegister

This file contains below list

org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider
org.apache.spark.sql.execution.datasources.json.JsonFileFormat
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
org.apache.spark.sql.execution.datasources.text.TextFileFormat
org.apache.spark.sql.execution.streaming.ConsoleSinkProvider
org.apache.spark.sql.execution.streaming.TextSocketSourceProvider
org.apache.spark.sql.execution.streaming.RateSourceProvider
assemblyMergeStrategy in assembly := {
  case PathList("META-INF","services",xs @ _*) => MergeStrategy.filterDistinctLines // Added this 
  case PathList("META-INF",xs @ _*) => MergeStrategy.discard  
  case _ => MergeStrategy.first
}

Sign up to request clarification or add additional context in comments.

3 Comments

I use jar tvf and could see this file in my jar, but this error still exists.
I had the same problem than OP and this solution solved the issue, I think it should be marked as accepted. Saved me a headache, thanks
Thank you! This was really helpful as I always assemble Spark Apps for a provided environment (local or cluster). But for the first time, I wanted to have everything in 1 Fat JAR self-contained and it was driving me crazy! Many thanks!
-1

Use spark-submit command to submit the spark job.

# Run application locally on 4 cores
./bin/spark-submit \
  --class extendedtable.mainObject \
  --master local[4] \
  /path/to/<your-jar>.jar 

ref- spark doc

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.