5

Please note that I am better dataminer than programmer. I am trying to run examples from book "Advanced analytics with Spark" from author Sandy Ryza (these code examples can be downloaded from "https://github.com/sryza/aas"), and I run into following problem. When I open this project in Intelij Idea and try to run it, I get error "Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD" Does anyone know how to solve this issue ?
Does this mean i am using wrong version of spark ?

First when I tried to run this code, I got error "Exception in thread "main" java.lang.NoClassDefFoundError: scala/product", but I solved it by setting scala-lib to compile in maven. I use Maven 3.3.9, Java 1.7.0_79 and scala 2.11.7 , spark 1.6.1. I tried both Intelij Idea 14 and 15 different versions of java (1.7), scala (2.10) and spark, but to no success. I am also using windows 7. My SPARK_HOME and Path variables are set, and i can execute spark-shell from command line.

4
  • Spark 1.6.1 compiles agains't 2.10.x, not 2.11.x. Also, do you have the right dependencies set in Maven? Can you show us your pom.xml file? Commented May 8, 2016 at 9:12
  • POM as is orriginaly from github: Commented May 8, 2016 at 11:38
  • Sorry for previous comment. POM file was made by author of this book, and it is very large file, I cannot post it to this site due to character limitation. The safest way is if you download it from "github.com/sryza/aas", I quess. Note: I can successfully build this POM with Maven via "mvn package" command. Commented May 8, 2016 at 11:53
  • Probably the classpath is incorrect. Check that the scala jars are on your classpath. Commented May 8, 2016 at 19:54

1 Answer 1

2

The examples in this book will show a --master argument to sparkshell, but you will need to specify arguments as appropriate for your environment. If you don’t have Hadoop installed you need to start the spark-shell locally. To execute the sample you can simply pass paths to local file reference (file:///), rather than a HDFS reference (hdfs://)

The author suggest an hybrid development approach:

Keep the frontier of development in the REPL, and, as pieces of code harden, move them over into a compiled library.

Hence the samples code are considered as compiled libraries rather than standalone application. You can make the compiled JAR available to spark-shell by passing it to the --jars property, while maven is used for compiling and managing dependencies.

In the book the author describes how the simplesparkproject can be executed:

use maven to compile and package the project

cd simplesparkproject/
mvn package 

start the spark-shell with the jar dependencies

spark-shell --master local[2] --driver-memory 2g --jars ../simplesparkproject-0.0.1.jar ../README.md

Then you can access you object within the spark-shell as follows:

val myApp = com.cloudera.datascience.MyApp

However if you want to execute the sample code as Standalone application and execute it within idea you need to modify the pom.xml. Some of dependencies are required for compilation, but are available in an spark runtime environment. Therefore these dependencies are marked with scope provided in the pom.xml.

<!--<scope>provided</scope>-->

you can remake the provided scope, than you will be able to run the samples within idea. But you can not provide this jar as dependency for the spark shell anymore.

Note: using maven 3.0.5 and Java 7+. I had problems with maven 3.3.X version with the plugin versions.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.