How to reindex data from one Elasticsearch cluster to another with elasticsearch-hadoop in Spark

Question

I have two separated Elasticsearch clusters, I want to reindex the data from the first cluster to the second cluster, but I found that I can only setup one Elasticsearch cluster inside SparkContext configuration, such as:

var sparkConf : SparkConf = new SparkConf()
                     .setAppName("EsReIndex")
sparkConf.set("es.nodes", "node1.cluster1:9200")

So how can I move data between two Elasticsearch clusters with elastic search-hadoop in Spark inside of the same application ?

eliasah · Accepted Answer · 2016-10-29 13:20:16Z

3

You don't need to configure the node address inside the SparkConf for the matter.

When you use your DataFrameWriter with elasticsearch format, you can pass the node address as an option as followed :

val df = sqlContext.read
                  .format("elasticsearch")
                  .option("es.nodes", "node1.cluster1:9200")
                  .load("your_index/your_type")

df.write
    .option("es.nodes", "node2.cluster2:9200")
    .save("your_new_index/your_new_type")

This should work with spark 1.6.X and the corresponding elasticsearch-hadoop connector.

edited Oct 29, 2016 at 13:20

answered Oct 29, 2016 at 8:03

eliasah

40.5k12 gold badges128 silver badges159 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to reindex data from one Elasticsearch cluster to another with elasticsearch-hadoop in Spark

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related