Spark (Java) to Elasticsearch

Question

I am testing to load data from a csv to spark then save it in Elasticsearch but I am having some trouble on saving my RDD collection in Elasticsearch using spark. This error is raised when submitting job:

Exception in thread "main" java.lang.NoClassDefFoundError: org/elasticsearch/spark/rdd/api/java/JavaEsSpark

But my dependencies should be correct since I compiled with Maven...

My pom.xml is here : http://pastebin.com/b71KL903 .

The error is raised when I reach this line:

JavaEsSpark.saveToEs(javaRDD, "index/logements");

Rest of my code is here: http://pastebin.com/8yuJB68A

I have already searched about this problem but didn't find anything: https://discuss.elastic.co/t/problem-between-spark-and-elasticsearch/51942 .

https://github.com/elastic/elasticsearch-hadoop/issues/713 .

https://github.com/elastic/elasticsearch-hadoop/issues/585 .

I just learnt that : The "ClassNotFoundException" appears because Spark will shutdown its job classloader immediately in case of an exception so any other classes that need to be loaded, will fail causing the initial error to be hidden.

But I don't know how to proceed. I submitted my job with the verbose mode, but didn't see anything else: http://pastebin.com/j6zmyjFr

Thanks for your further help :)

Knight71 · Accepted Answer · 2016-06-30 12:39:11Z

3

Spark has executors and driver process. Executor runs in different node apart from driver node. Spark computes the rdd graph in various stages depending up on the transformations. And these stages have tasks that is executed on executors. So you need to pass the dependent jars to both executors and driver if you are using the library methods to compute rdd.

You should pass the dependent jars in --jars options in spark-submit

    spark-submit --jars $JARS \
     --driver-class-path $JARS_COLON_SEP \
     --class $CLASS_NAME $APP_JAR

In your case it would be

    spark-submit --jars elasticsearch-hadoop-2.3.2.jar \
    --master local[4]\
     --driver-class-path elasticsearch-hadoop-2.3.2.jar \
     --class "SimpleApp" target/simple-project-1.0.jar

edited Jun 30, 2016 at 12:39

answered Jun 30, 2016 at 9:35

Knight71

2,9595 gold badges39 silver badges66 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

kulssaka Over a year ago

I don't know what driver-class-path I need and what class. I added the jar elasticsearch-hadoop but i don't know what to add after

kulssaka Over a year ago

bin/spark-submit --verbose --class "SimpleApp" --master local[4] target/simple-project-1.0.jar --jars elasticsearch-hadoop-2.3.2.jar and then what driver class i need to add ? the class it seems missing is JavaEsSpark thanks

Knight71 Over a year ago

The same elasticsearch-hadoop jar you need to add to driver class path.

kulssaka Over a year ago

bin/spark-submit --verbose --class "SimpleApp" --master local[4] target/simple-project-1.0.jar --jars elasticsearch-hadoop-2.3.2.jar is not working, this is the only jar required for JavaEsSpark. I also tried : ../../bin/spark-submit --verbose --class "SimpleApp" --master local[4] target/simple-project-1.0.jar --jars elasticsearch-hadoop-2.3.2.jar --driver-class-path elasticsearch-hadoop-2.3.2 --class "JavaEsSpark" sorry, i'm a beginner in this...

Knight71 Over a year ago

try this spark-submit --jars elasticsearch-hadoop-2.3.2.jar \ --master local[4]\ --driver-class-path elasticsearch-hadoop-2.3.2.jar \ --class "SimpleApp" target/simple-project-1.0.jar \

|

Collectives™ on Stack Overflow

Spark (Java) to Elasticsearch

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related