Spark Dataframe upsert to Elasticsearch

Question

I am using Apache Spark DataFrame and I want to upsert data to Elasticsearch and I found I can overwrite them like this

val df = spark.read.option("header","true").csv("/mnt/data/akc_breed_info.csv")

df.write
  .format("org.elasticsearch.spark.sql")
  .option("es.nodes.wan.only","true")
  .option("es.port","443")
  .option("es.net.ssl","true")
  .option("es.nodes", esURL)
  .option("es.mapping.id", index)
  .mode("Overwrite")
  .save("index/dogs")

but what i noticed so far is this command mode("Overwrite") is actually delete all existing duplicated data and insert the new data

is there a way I can upsert them not delete and re-write them ? because I need to query those data almost real time. thanks in advance

Daniel · Accepted Answer · 2018-06-21 07:47:16Z

10

The reason why mode("Overwrite") was a problem is that when you overwrite your entire dataframe it deletes all data that matches with your rows of dataframe at once and it looks like the entire index is empty for me and I figure out how to actually upsert it

here is my code

df.write
  .format("org.elasticsearch.spark.sql")
  .option("es.nodes.wan.only","true")
  .option("es.nodes.discovery", "false")
  .option("es.nodes.client.only", "false")
  .option("es.net.ssl","true")
  .option("es.mapping.id", index)
  .option("es.write.operation", "upsert")
  .option("es.nodes", esURL)
  .option("es.port", "443")
  .mode("append")
  .save(path)

Note that you have to put "es.write.operation", "upert" and .mode("append")

answered Jun 21, 2018 at 7:47

Daniel

6168 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Soumendra Over a year ago

What is the value of the index ?

Daniel Over a year ago

@Soumendra it's mapping id of ES as shown. For me, it's userId

Constantine · Accepted Answer · 2018-06-21 07:45:32Z

2

Try setting:

es.write.operation = upsert

This should perform the required operation. You can find more details in https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html

answered Jun 21, 2018 at 7:45

Constantine

1,41615 silver badges19 bronze badges

2 Comments

Daniel Over a year ago

thanks for answering. I tried that but it didn't work for me and I needed to put .mode("append") too

Andrew van der Watt Over a year ago

While this is correct, you need to set the mode to "append" otherwise all the existing documents will be removed from the index.

Collectives™ on Stack Overflow

Spark Dataframe upsert to Elasticsearch

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related