13

I have a mysql database with couple tables, I wanna migrate the mysql data to ElasticSearch. It's easy to migrate the whole database to ES via a batch job. But how should I update ES from mysql realtime. i.e if there was a update operation in mysql then I should do the same operation in ES. I researched mysql binLog which can reflect any changes from mysql. But I have to parse binLog to ES syntax, I think it's really painful. Thanks! (the same case with Solr)

3 Answers 3

11

There is an existing project which takes your binlog, transforms it and ships it to Elasticsearch, You can check it out at: https://github.com/siddontang/go-mysql-elasticsearch

Another one would be this one: https://github.com/noplay/python-mysql-replication.

Note, however, that whichever you pick, it's a good practice to pre-create your index and mappings before indexing your binlog. That gives you more control over your data.

UPDATE:

Here is another interesting blog article on the subject: How to keep Elasticsearch synchronized with a relational database using Logstash

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much! go-mysql-es is awesome! Just one question: do you have idea how does it do Upsert operation? e.g I have two tables t1(uid,name),t2(uid,age) they are having the same id and the two tables are corresponding to one index. But when one table was updated it would overwrite(remove) the existing record in ES. Actually I hope that is update not overwrite.
@Jack answering your comment so that it could help someone. Update operation always create a new document, update the version of document and the mark the previous version for deletion. This is the standard way how ES works.
1

The best open source solution would be this. You can run this as a command line and give the incremental logic too in the command.

GO through this session to get a complete idea.

1 Comment

Thanks! but it's not what I'm looking for. The doc only shows how to get incremental data, but I do need to monitor the deleted data and updated data.
0

I guess best option is to simply use Kafka connect plugin called debezium, and use the Mysql Connector for source, and Elastic Search sink connector

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.