1

sorry if I've asked a silly question but I can't figure out the solution. I have data stored in mongodb and the collections are mapped to es indices using richardwilly's plugin. However, a couple of my indices are messed up (due to which not all the data that I expect to see is in es (its still in mongodb)). I tried creating a dummy index on dummy data and I expect that after re-indexing I will now see this data in es.

The problem seems to be that the mongo river operates on the oplog and after I delete the index, after inserting the next first new document I want to see the other thousands of documents in mongodb to automatically now be visible in es. However, I only see the documents that I inserted after deleting and recreating the indexes. The other 1000's of documents are still visible in mongo but not in es.

I did a small experiment and I saw that if I actually reinserted the 500 documents, they are then visible in elasticsearch(if the index is right to allow them all in). Can you please tell me how I can make the data in mongodb visible in es after I recreate the index without having to delete and reinsert as I cannot do this. Do I need to replay the oplog or is there another approach that you can suggest such that I can get this data into es without deleting and reinserting?

Thanks!

3 Answers 3

2

The MongoDB river, as you say, works by using Mongo's oplog, which means you can only ever index changes to documents into Elastic. (Changes to Mongo indexes have no bearing on the oplog) in order to index documents created prior to your first oplog entry, you'll need to find another way.

If you don't want to delete+reinsert, you could perform a bulk update on your existing documents.

Alternatively, you could implement a tool which finds the first doc in Elastic, queries Mongo to find any earlier docs and indexes the missing ones.

Sign up to request clarification or add additional context in comments.

1 Comment

What if i change the oplog time start to the starting of data insert into mongoDB, to do first indexing ? or bulk update is the better solution ? like adding a flag and removing it ?
0

Answering my own question, I got helped out by the elasticsearch community. If you delete the river and create a new one, then all the data in the collection you map to should be available in the elasticsearch index.

2 Comments

No, just re-creating the river does not seem to bulk-load data from mongodb. I think the only way to get the existing data from mongodb is for it to all go through the oplog. I am on 0.20.1, so let me know if you've found something different.
he does not need to bulk-load the data as the index is still intact. re-creating the river does not affect the data stored in the index.
0

If re-creating the river doesn't work, there are a couple of options.

  1. After you have configured and started your replica set, reload your database with mongodump/mongorestore. Because the river uses the oplog, when you create your river, the data needs to have passed through the oplog if the new river is going to know that the data exists and should be indexed. (This is perhaps easier to do in a development environment.)

  2. Another way that seems possible is to touch all of the objects through the rails console. Again, make sure your replica set is already running:

    $ bundle exec rails c
    1.9.1 :001 > Person.all.each do |person|
    1.9.1 :002 >     person.save()
    1.9.1 :003?>   end
    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.