8

I've created a Apache Spark application using Java. All it does is counting the lines containing the "spark" word 1000 times.

Here's my code:

public class Example1 {
    public static void main(String[] args) {
        String logfile = args[0];
        try{
            SparkConf conf = new SparkConf();
            conf.setAppName("Sample");
            conf.setMaster("spark://<master>:7077");
            conf.set("spark.executor.memory", "1g");
            JavaSparkContext sc = new JavaSparkContext(conf);
            JavaRDD<String> logData = sc.textFile(logfile).cache();
            long count = 0;
            for(int i=0; i<=1000; i++){
                count += logData.filter(new Function<String, Boolean>(){
                    public Boolean call(String s){
                        if (s.toLowerCase().contains("spark"))
                            return true;
                        else
                            return false;
                    }
                }).count();
            }
        }
        catch(Exception ex){
            System.out.println(ex.getMessage());
        }
    }
}

When I perform a debug in Eclipse IDE, I am encountering java.lang.ClassNotFoundException:

WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: org.spark.java.examples.Example1$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)

I also tried to deploy this inside the cluster using spark-submit, but still, the same exception was encountered. Here's a portion of the stacktrace:

ERROR Executor: Exception in task ID 4
java.lang.ClassNotFoundException: org.spark.java.examples.Example1$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)

Any ideas on how to resolve this? Thanks in advance!

2
  • curious why the need of looping 1000x over the same filter? Commented Jun 14, 2014 at 11:11
  • 1
    @maasg this is just to mimic a possible big/long job, and check how much time will it take when ran on cluster Commented Jun 14, 2014 at 19:00

3 Answers 3

11

You need to deliver the jar with your job to the workers. To do that, have maven build a jar and add that jar to the context:

 conf.setJars(new String[]{"path/to/jar/Sample.jar"}); [*]

For a 'real' job you would need to build a jar with dependencies (check Maven shade plugin), but for a simple job with no external dependencies, a simple jar is sufficient.

[*] I'm not very familiar with the Spark java API, just assuming it should be something like this.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the suggestion. I'll test this and will get back at you if this works
@maasg It works for me with physical jar path like you mentioned in your example above but when I try conf.setJars(SparkContext.jarOfClass(Application.class).toList()) it doesnt work. Any idea?
Jason - since you say that this suggestion resolved your original question, you should accept it. Then move on to a new question with the details from your not-an-answer, referring back to this one for context.
Also, don't mix "conf.setJar" with --jars options <!>
3

You must include your jar in the worker's classpath. You can do this in two ways:

The first one is the recommended method.

Comments

2

This can also happen if you do not specify the full package name when using spark-submit command line. If your main method for the application is in test.spark.SimpleApp then the command line needs to look something like this:

./bin/spark-submit --class "test.spark.SimpleApp" --master local[2] /path_to_project/target/spark_testing-1.0-SNAPSHOT.jar

Adding just --class "SimpleApp" will fail with ClassNotFoundException.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.