Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD

Question

Please note that I am better dataminer than programmer. I am trying to run examples from book "Advanced analytics with Spark" from author Sandy Ryza (these code examples can be downloaded from "https://github.com/sryza/aas"), and I run into following problem. When I open this project in Intelij Idea and try to run it, I get error "Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD" Does anyone know how to solve this issue ?
Does this mean i am using wrong version of spark ?

First when I tried to run this code, I got error "Exception in thread "main" java.lang.NoClassDefFoundError: scala/product", but I solved it by setting scala-lib to compile in maven. I use Maven 3.3.9, Java 1.7.0_79 and scala 2.11.7 , spark 1.6.1. I tried both Intelij Idea 14 and 15 different versions of java (1.7), scala (2.10) and spark, but to no success. I am also using windows 7. My SPARK_HOME and Path variables are set, and i can execute spark-shell from command line.

Spark 1.6.1 compiles agains't 2.10.x, not 2.11.x. Also, do you have the right dependencies set in Maven? Can you show us your pom.xml file? — Yuval Itzchakov
– Yuval Itzchakov, Commented May 8, 2016 at 9:12
Sorry for previous comment. POM file was made by author of this book, and it is very large file, I cannot post it to this site due to character limitation. The safest way is if you download it from "github.com/sryza/aas", I quess. Note: I can successfully build this POM with Maven via "mvn package" command. — John
– John, Commented May 8, 2016 at 11:53
Probably the classpath is incorrect. Check that the scala jars are on your classpath. — Has QUIT--Anony-Mousse
– Has QUIT--Anony-Mousse, Commented May 8, 2016 at 19:54

Avi Chalbani · Accepted Answer · 2016-05-09 13:33:26Z

The examples in this book will show a --master argument to sparkshell, but you will need to specify arguments as appropriate for your environment. If you don’t have Hadoop installed you need to start the spark-shell locally. To execute the sample you can simply pass paths to local file reference (file:///), rather than a HDFS reference (hdfs://)

The author suggest an hybrid development approach:

Keep the frontier of development in the REPL, and, as pieces of code harden, move them over into a compiled library.

Hence the samples code are considered as compiled libraries rather than standalone application. You can make the compiled JAR available to spark-shell by passing it to the --jars property, while maven is used for compiling and managing dependencies.

In the book the author describes how the simplesparkproject can be executed:

use maven to compile and package the project

cd simplesparkproject/
mvn package

start the spark-shell with the jar dependencies

spark-shell --master local[2] --driver-memory 2g --jars ../simplesparkproject-0.0.1.jar ../README.md

Then you can access you object within the spark-shell as follows:

val myApp = com.cloudera.datascience.MyApp

However if you want to execute the sample code as Standalone application and execute it within idea you need to modify the pom.xml. Some of dependencies are required for compilation, but are available in an spark runtime environment. Therefore these dependencies are marked with scope provided in the pom.xml.

<!--<scope>provided</scope>-->

you can remake the provided scope, than you will be able to run the samples within idea. But you can not provide this jar as dependency for the spark shell anymore.

Note: using maven 3.0.5 and Java 7+. I had problems with maven 3.3.X version with the plugin versions.

Collectives™ on Stack Overflow

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related