10

I'm actually working in a search engine project. We are working with python + mongoDb.

I have a pymongo cursor after excecuting a find() command to the mongo db. The pymongo cursor has around 20k results.

I have noticed that the iteration over the pymongo cursor is really slow compared with a normal iteration over for example a list of the same size.

I did a little benchmark:

  • iteration over a list of 20k strings: 0.001492 seconds
  • iteration over a pymongo cursor with 20k results: 1.445343 seconds

The difference is really a lot. Maybe not a problem with this amounts of results, but if I have millions of results the time would be unacceptable.

Has anyone got an idea of why pymongo cursors are too slow to iterate? Any idea of how can I iterate the cursor in less time?

Some extra info:

  • Python v2.6
  • PyMongo v1.9
  • MongoDB v1.6 32 bits
4
  • Can you change the logic of your application -- for instance using .skip() and .limit() -- so that you don't return such large result sets? Commented Mar 30, 2011 at 0:00
  • In fact, 20k is a really little % of the total amount of documents. I think that is not an scalable solution, because I expect to have much more results than 20k. Thanks any way =). Commented Mar 30, 2011 at 0:07
  • Are each of your results bare strings? Commented Mar 30, 2011 at 0:09
  • No. I have a documment similar to: {"something": "string", "other": [{"key", "value"},{"key2": "value2"},...], "something_more": integer}. Any way, I have recently tried with a collection of bare strings documents, link this: {"something": "string"} and the difference of time in iteration is the same. :S Commented Mar 30, 2011 at 0:17

3 Answers 3

16

Is your pymongo installation using the included C extensions?

>>> import pymongo
>>> pymongo.has_c()
True

I spent most of last week trying to debug a moderate-sized query and corresponding processing that took 20 seconds to run. Once the C extensions were installed, the whole same process took roughly a second.

To install the C extensions in Debian, install the python development headers before running easy install. In my case, I also had to remove the old version of pymongo. Note that this will compile a binary from C, so you need all the usual tools. (GCC, etc)

# on ubuntu with pip
$ sudo pip uninstall pymongo
$ sudo apt-get install python-dev build-essential
$ sudo pip install pymongo
Sign up to request clarification or add additional context in comments.

2 Comments

this definitively blasted my performance. I had a query that took 5 sec, now taking 0.01! I've added the steps for installing on ubuntu
I know this is very old, but how can you install this on the latest OS X?
13

Remember the pymongo driver is not giving you back all 20k results at once. It is making network calls to the mongodb backend for more items as you iterate. Of course it wont be as fast as a list of strings. However, I'd suggest trying to adjust the cursor batch_size as outlined in the api docs:

7 Comments

I have notices too, that of course the time depends on the amount of data you transfet between mongo and the script. Thats why I have changed my query adding restrictions to the keys I don't need inside the iteration, like: .find({},{"key1":0, "key3":0}). That decreased the time a lot.
Both items make sense - the batch controls how many items are sent on each fetch from MongoDB. Certainly limiting the fields you return to only the ones you are using will reduce the necessary network traffic.
Not particularly - it's the nature of shipping 20k documents over the network.
@Brendan. I made some test using the C++ driver in stead of the python one, and the performance was 3 times faster with the same query. I think that it is something important to take into account. Thanks!
The link in the answer is not working ..please update it.
|
1

the default cursor size is 4MB, and the maximum it can go to is 16MB. you can try to increase your cursor size until that limit is reached and see if you get an improvement, but it also depends on what your network can handle.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.