0

I am new to Spark and am trying to run on a hadoop cluster a simple spark jar file built through maven in intellij. But I am getting classnotfoundexception in all the ways I tried to submit the application through spark-submit.

My pom.xml:

<?xmlversion="1.0"encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>org.example</groupId>
<artifactId>SparkTrans</artifactId>
<version>1.0-SNAPSHOT</version>

<dependencies>
<!--https://mvnrepository.com/artifact/org.apache.spark/spark-core-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.3</version>
</dependency>
<!--https://mvnrepository.com/artifact/org.apache.spark/spark-sql-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.3</version>
</dependency>

<!--https://mvnrepository.com/artifact/org.apache.spark/spark-hive-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.3</version>
<scope>compile</scope>
</dependency>
<!--https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-slf4j-impl-->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
<version>2.8</version>
<scope>test</scope>
</dependency>
<!--https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-api-->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.8</version>
</dependency>
<!--https://mvnrepository.com/artifact/com.typesafe/config-->
<dependency>
<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.3.4</version>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.11</artifactId>
<version>3.1.1</version>
<scope>test</scope>
</dependency>
</dependencies>


<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<id>shade-libs</id>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>resources/*</exclude>
</excludes>
</filter>
</filters>
<shadedClassifierName>fat</shadedClassifierName>
<shadedArtifactAttached>true</shadedArtifactAttached>
<relocations>
<relocation>
<pattern>org.apache.http.client</pattern>
<shadedPattern>shaded.org.apache.http.client</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>


</project>

My main scala object (SparkTrans.scala):

import common.InputConfig
import org.apache.spark.sql.{DataFrame,SparkSession}
importorg.slf4j.LoggerFactory

object SparkTrans{

private val logger=LoggerFactory.getLogger(getClass.getName)

def main(args:Array[String]):Unit={
try{
logger.info("main method started")
logger.warn("This is a warning")

val arg_length=args.length

if(arg_length==0){
logger.warn("No Argument passed")
System.exit(1)
}

val inputConfig:InputConfig=InputConfig(env=args(0),targetDB=args(1))
println("The first argument passed is" + inputConfig.env)
println("The second argument passed is" + inputConfig.targetDB)

val spark=SparkSession
.builder()
.appName("SparkPOCinside")
.config("spark.master","yarn")
.enableHiveSupport()
.getOrCreate()

println("Created Spark Session")

val sampleSeq=Seq((1,"Spark"),(2,"BigData"))

val df1=spark.createDataFrame(sampleSeq).toDF("courseid","coursename")
df1.show()


logger.warn("sql_test_a method started")
val courseDF=spark.sql("select * from MYINSTANCE.sql_test_a")
logger.warn("sql_test_a method ended")
courseDF.show()


}
catch{
case e:Exception=>
logger.error("An error has occurred in the main method" + e.printStackTrace())
}


}

}

I tried the below commands to spark-submit, but all of them give classnotfoundexception. I tried to switch the arguments around where I mention the --class right after --deploy-mode but in vain:

spark-submit --master yarn --deploy-mode cluster --queue ABCD --conf spark.yarn.security.tokens.hive.enabled=false --files hdfs://nameservice1/user/XMLs/hive-site.xml --keytab hdfs://nameservice1/user/MYINSTANCE/landing/workflow/wf_data/lib/MYKEY.keytab --num-executors 1 --executor-cores 1 --executor-memory 2g --conf spark.yarn.executor.memoryOverhead=3072 --class org.example.SparkTrans hdfs://nameservice1/user/MYINSTANCE/landing/workflow/wf_data/SparkTrans-1.0-SNAPSHOT-fat.jar dev somedb


spark-submit --master yarn --deploy-mode cluster --queue ABCD --conf spark.yarn.security.tokens.hive.enabled=false --files hdfs://nameservice1/user/XMLs/hive-site.xml --keytab hdfs://nameservice1/user/MYINSTANCE/landing/workflow/wf_data/lib/MYKEY.keytab --num-executors 1 --executor-cores 1 --executor-memory 2g --conf spark.yarn.executor.memoryOverhead=3072 --class org.example.SparkTrans --name org.example.SparkTrans hdfs://nameservice1/user/MYINSTANCE/landing/workflow/wf_data/SparkTrans-1.0-SNAPSHOT-fat.jar dev somedb


spark-submit --master yarn --deploy-mode cluster --queue ABCD --conf spark.yarn.security.tokens.hive.enabled=false --files hdfs://nameservice1/user/XMLs/hive-site.xml --keytab hdfs://nameservice1/user/MYINSTANCE/landing/workflow/wf_data/lib/MYKEY.keytab --num-executors 1 --executor-cores 1 --executor-memory 2g --conf spark.yarn.executor.memoryOverhead=3072 --class SparkTrans hdfs://nameservice1/user/MYINSTANCE/landing/workflow/wf_data/SparkTrans-1.0-SNAPSHOT-fat.jar dev somedb

Exact error I am getting:

btrace WARNING: No output stream. DataCommand output is ignored.
[main] INFO ResourceCollector - Unravel Sensor 4.6.1.8rc0013/2.0.3 initializing.
21/06/11 10:09:27 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
21/06/11 10:09:28 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1614625006458_6646161_000001
21/06/11 10:09:30 INFO spark.SecurityManager: Changing view acls to: MYKEY
21/06/11 10:09:30 INFO spark.SecurityManager: Changing modify acls to: MYKEY
21/06/11 10:09:30 INFO spark.SecurityManager: Changing view acls groups to: 
21/06/11 10:09:30 INFO spark.SecurityManager: Changing modify acls groups to: 
21/06/11 10:09:30 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(MYKEY); groups with view permissions: Set(); users  with modify permissions: Set(MYKEY); groups with modify permissions: Set()
21/06/11 10:09:30 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
21/06/11 10:09:30 ERROR yarn.ApplicationMaster: Uncaught exception: 
java.lang.ClassNotFoundException: SparkTrans
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:561)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:347)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:197)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:695)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:68)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:693)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
21/06/11 10:09:30 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.ClassNotFoundException: SparkTrans)
21/06/11 10:09:30 INFO util.ShutdownHookManager: Shutdown hook called

Can any of you let me know what I am doing wrong? I have checked and see the hive-site.xml and my jar are in the correct locations in hdfs as mentioned in my commands.

1 Answer 1

1

You need to add scala-compiler configuration to your pom.xml. The problem is without that there is nothing to compile your SparkTrans.scala file into java classes.

Add:

<project>
  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>4.5.2</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

to your pom.xml and ensure your scala file is in src/main/scala

Then it should be compiled and added to your jar. Here's the documentation for the scala plugin.

You can check what's in your jar with jar tf jar-file, see guide here.

Sign up to request clarification or add additional context in comments.

9 Comments

You'll need to specify the Scala compiler version too
Hi, I added this library in the pom.xml under dependencies as well as, as a plugin. but still i am not able to see my class name when i run the "jar tf SparkTrans-1.0-SNAPSHOT-fat.jar" command. Also still getting the classNotfoundexecption error when i try spark-submit with the newly built jar. Any other changes needs to be done to get this error resolved?
Sorry to hear, I'll have another look.
D'oh, @Vivek you also need to add <executions> to the plugin config. Amended the answer. With this you should see maven goals being executed like [INFO] --- scala-maven-plugin:4.5.2:compile (default) @ SparkTrans --- when you mvn install
Do you mean to say i need to add the below:
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.