MongoDB - query time in profiler and from Python code

Question

I'm trying to understand how much time do some basic queries to my MongoDB collections (around 200k documents) take and i don't understand why according to the MongoDB profiler the query takes around 15 milliseconds, while from Python the query is going to take from 1 to 2 seconds.

In order to track the query from Python i did something very basic:

import time
from pymongo import MongoClient

client = MongoClient('')
db = client.my_db

start = time.time()
query = db['my_col'].find({'unix': {'$gte': 'some_timestamp' }})
data = list(query)
end = time.time()

print(end-start)

So here i'm just retrieving all the documents after a specific unix timestamp and then i convert the query to a Python list. The output of this code will range from >>1.01 to >>1.80 seconds on average, while according to the profiler the query takes just some milliseconds.

What am i missing here? Is it because what actually takes time is the loop through the cursor?

This code includes the time required to send the query to the database, execute the query, create a list with all of the query result objects, the network time to send the results from the server to the client and the time to establish a connection. The MongoDB profiler will only include the time to execute the query. This code measures something different than what the profiler measures. If you're concerned about performance, try increasing the cursor batch size and ensure that pymongo is using the C extensions. — Michael Ruth
– Michael Ruth, Commented May 31, 2021 at 23:09

Belly Buster · Accepted Answer · 2021-05-31 21:58:53Z

1

pymongo won't make a connection to the database until the first transactional command. So your timings include all the connection setup etc.

This should give you more accurate timings:

import time
from pymongo import MongoClient

client = MongoClient('')
db = client.my_db

db['my_col'].find_one()

start = time.time()
query = db['my_col'].find({'unix': {'$gte': 'some_timestamp' }})
data = list(query)
end = time.time()

print(end-start)

answered May 31, 2021 at 21:58

Belly Buster

8,9142 gold badges12 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JayK23 Over a year ago

Thank you a lot! But do you have any idea why do the queries take a lot less, according to the profiler?

Collectives™ on Stack Overflow

MongoDB - query time in profiler and from Python code

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related