0

I have an existing collection in MongoDB and I need to update certain documents by getting values from a MySQL table.

Currently, I am using the following python code to update the Collection using MongoEngine but it takes a lot of time since the dataset is very large (2m+ documents in the Collection, out of which 80k+ documents need to be updated)

query = "Select * from business"
cursor.execute(query)
for line in cursor:
    Movie.objects(movieid=line[0]).first().update(set__business=line[1])

While updating early documents (like first 5k documents) it takes very less amount of time, but when it starts to go beyond that, it takes forever. My assumption is that it starts finding the record from the beginning every time. It can be somehow optimized, since MySQL table records are ordered.

Is there a way to search in the collection so that the next loop starts from the last updated documentid? What is the best method to update MongoDB collections using MongoEngine?

(Additional info: inserting 2m+ records took about 2 hours whereas updating 80k records took 25 hours)

2
  • Do you have an index on movieid? Commented Mar 12, 2017 at 14:58
  • @JohnnyHK, Thanks, I did not realize that. If you wish to add it as an answer, I'll accept it. Commented Mar 12, 2017 at 16:06

1 Answer 1

1

Add an index on movieid to your collection to speed up your updates that key off that field.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.