Elasticsearch not automatically pulling out existing mongoDB data once index has been deleted and recreated

Question

sorry if I've asked a silly question but I can't figure out the solution. I have data stored in mongodb and the collections are mapped to es indices using richardwilly's plugin. However, a couple of my indices are messed up (due to which not all the data that I expect to see is in es (its still in mongodb)). I tried creating a dummy index on dummy data and I expect that after re-indexing I will now see this data in es.

The problem seems to be that the mongo river operates on the oplog and after I delete the index, after inserting the next first new document I want to see the other thousands of documents in mongodb to automatically now be visible in es. However, I only see the documents that I inserted after deleting and recreating the indexes. The other 1000's of documents are still visible in mongo but not in es.

I did a small experiment and I saw that if I actually reinserted the 500 documents, they are then visible in elasticsearch(if the index is right to allow them all in). Can you please tell me how I can make the data in mongodb visible in es after I recreate the index without having to delete and reinsert as I cannot do this. Do I need to replay the oplog or is there another approach that you can suggest such that I can get this data into es without deleting and reinserting?

Thanks!

YannCluchey · Accepted Answer · 2012-10-01 06:53:03Z

2

The MongoDB river, as you say, works by using Mongo's oplog, which means you can only ever index changes to documents into Elastic. (Changes to Mongo indexes have no bearing on the oplog) in order to index documents created prior to your first oplog entry, you'll need to find another way.

If you don't want to delete+reinsert, you could perform a bulk update on your existing documents.

Alternatively, you could implement a tool which finds the first doc in Elastic, queries Mongo to find any earlier docs and indexes the missing ones.

answered Oct 1, 2012 at 6:53

YannCluchey

212 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

php.khan Over a year ago

What if i change the oplog time start to the starting of data insert into mongoDB, to do first indexing ? or bulk update is the better solution ? like adding a flag and removing it ?

user1427026 · Accepted Answer · 2012-10-01 21:09:54Z

0

Answering my own question, I got helped out by the elasticsearch community. If you delete the river and create a new one, then all the data in the collection you map to should be available in the elasticsearch index.

answered Oct 1, 2012 at 21:09

user1427026

8792 gold badges18 silver badges33 bronze badges

2 Comments

coreyt Over a year ago

No, just re-creating the river does not seem to bulk-load data from mongodb. I think the only way to get the existing data from mongodb is for it to all go through the oplog. I am on 0.20.1, so let me know if you've found something different.

Wei Hao Over a year ago

he does not need to bulk-load the data as the index is still intact. re-creating the river does not affect the data stored in the index.

coreyt · Accepted Answer · 2012-12-13 11:14:50Z

0

If re-creating the river doesn't work, there are a couple of options.

After you have configured and started your replica set, reload your database with mongodump/mongorestore. Because the river uses the oplog, when you create your river, the data needs to have passed through the oplog if the new river is going to know that the data exists and should be indexed. (This is perhaps easier to do in a development environment.)
Another way that seems possible is to touch all of the objects through the rails console. Again, make sure your replica set is already running:
```
$ bundle exec rails c
1.9.1 :001 > Person.all.each do |person|
1.9.1 :002 >     person.save()
1.9.1 :003?>   end
```

edited Dec 13, 2012 at 11:14

answered Dec 10, 2012 at 23:38

coreyt

5057 silver badges15 bronze badges

Collectives™ on Stack Overflow

Elasticsearch not automatically pulling out existing mongoDB data once index has been deleted and recreated

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related