0

I'm currently failing to build a mongoDB query that uses first a sort and then outputs one sample. Documents in the database look like this:

{
    "_id" : "cynthia",
    "crawled" : false,
    "priority" : 2.0
}

I'm trying to achieve the following: Get me one random element with the highest priority.
I tested it with the following query:

db.getCollection('profiles').aggregate([
   {$match: {crawled: false }}, 
   {$sort: {priority: -1}}, 
   {$sample: {size: 1}}
]) 

Unfortunately, this is not working. Mongo seems to totally ignore the $sort. I see no difference between using with $sort or not.

Does anybody with more mongoDB experience has an idea on that? If you have an idea of a better implementation of the "priority" feature just tell me.

Every idea is highly appreciated.

4
  • Are you trying to get "one" random value? If so then what does priority have to do with this in your mind? Commented May 11, 2018 at 22:08
  • We have several services using this query to get their current jobs they should work on. By using some randomness, I'm trying to prevent, they block each other. It's save enough for our use-case. With this priority we try to create jobs, that should be worked on as first. Don't know, if their is a better way implementing it .. Commented May 13, 2018 at 10:01
  • The answer below "presumed" that you meant $limit but had the wrong keyword. So that's clearly not what you were asking. I'm asking you if you actually intend "one" to be random then to actually explain how priority should apply. It's really unclear what you expect and the language isn't coming through clearly either. I think if you could show us from a selection of 5 to 10 sample documents what you would expect to happen. If you could basically "draw that" by showing the documents and what you expect to happen, then it would be very clear for everyone. Commented May 13, 2018 at 10:08
  • Hi Neil, sorry for the late answer. I marked mickls answer as correct now, as it is the closest to what I'm trying to achieve. The combination of $limit and afterwards a $sample is doing exactly what we expected. It's just important to have indexing enabled on the fields used. Thank you all :) Commented Jul 4, 2018 at 17:15

1 Answer 1

2

$sample is not what you're looking for. According to the docs:

Randomly selects the specified number of documents from its input.

So you'll get one random document from your filtered set of documents.

$limit is what you need since it takes first n documents from previous stage. Your pipeline should look like this:

db.profiles.aggregate([
    {$match: {crawled: false }}, 
    {$sort: {priority: -1}}, 
    {$limit: 1}
])
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the answer! Limit is a good idea. I still want to have some randomness when selecting the results, as this is our cheap design of "non blocking" queries. We have several workers using the results of a query to do their job and I don't want workers to work on the same result by mistake. I did it now like this: ([ {$match: {crawled: false }}, {$sort: {priority: -1}}, {$limit: 10}, {$sample: {size: 1}} ])

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.