1

I have a collection called englishWords, and the unique index is the "word" field. When I do this

from pymongo import MongoClient

tasovshik = MongoClient()
db = tasovshik.tongler
coll = db.englishWords

f = open('book.txt')
for word in f.read().split():
    coll.insert( { "word": word } } )

I get this error message

pymongo.errors.DuplicateKeyError: E11000 duplicate key error index: tongler.englishWords.$word_1 dup key: { : "Harry" }
, but it stops to insert when the first existing word is to be inserted.

I do not want to implement the check of existence, I want to use the benefits of unique index with no problems.

0

3 Answers 3

3

You could do the following:

for word in f.read().split():
    try:
        coll.insert( { "word": word } } )
    except pymongo.errors.DuplicateKeyError:
        continue

This will ignore errors.

And also, did you drop the collection before trying?

Sign up to request clarification or add additional context in comments.

2 Comments

No I didn't, I am going to take many text files and insert all English words to that collection, so I will not drop it. and it python drops this error message Traceback (most recent call last): File "main.py", line 14, in <module> except pymongo.errors.DuplicateKeyError: NameError: name 'pymongo' is not defined
I added import pymongo at the beginning and it worked, thanks.
2

To avoid unnecessary exception handling, you could do an upsert:

from pymongo import MongoClient

tasovshik = MongoClient()
db = tasovshik.tongler
coll = db.englishWords

for word in f.read().split():
    coll.replace_one({'word': word}, {'word': word}, True)

The last argument specifies that MongoDB should insert the value if it does not already exist.

Here's the documentation.


EDIT: For even faster performances for a long list of words, you could do it in bulk like this:

from pymongo import MongoClient

tasovshik = MongoClient()
db = tasovshik.tongler
coll = db.englishWords

bulkop = coll.initialize_unordered_bulk_op()
for word in f.read().split():
    bulkop.find({'word':word}).upsert()

bulkop.execute()

Taken from bulk operations documentation

5 Comments

sorry but is upsert efficient in this case?
It is since you have a unique index on the word column. If you want efficiency over the long list of words, I'll update my answer to provide a even quicker variant.
.. on the word *property, not column :)
the second variant returns this Traceback (most recent call last): File "main.py", line 31, in <module> bulkop.execute() File "/Library/Python/2.7/site-packages/pymongo-3.2-py2.7-macosx-10.9-intel.egg/pymongo/bulk.py", line 628, in execute File "/Library/Python/2.7/site-packages/pymongo-3.2-py2.7-macosx-10.9-intel.egg/pymongo/bulk.py", line 450, in execute pymongo.errors.InvalidOperation: No operations to execute
Does it return that every time or just after the first time?
0

I've just run your code and everything looks good except that you have an extra } at the last line. Delete that, and you don't have the drop any collection. Every insert, creates it's own batch of data, so there is no need for dropping the previous collection.

Well, error msg indicates that the key Harry is already inserted and you are trying to insert again with the same key. Looks like this in not your entire code?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.