4

Problem:

I'm trying to get a list of documents and for each one of them to calculate the number of occurrences of a given value in a nested array of the same document.

I have a working example using the aggregation framework, but I wonder if there is a better way to accomplish the same thing, so I can benchmark the different approaches.

Simplified Data Model:

Single document in collection "Raffles":

{
  "_id" : objectId,
  "name" : string,
  "ends_at" : ISODate/Timestamp,
   ...
  "subscribers" : string[] //List of user ids
}
  • The collections consists of documents representing a raffle/sweepstake with a name and start/ending dates.
  • Users can subscribe to a raffle.
  • Users can subscribe multiple times for the same raffle.

95% of the read queries will require both the raffle data like name, description and dates + information about the subscribed users. That's why I decided to have everything in a single raffle document instead of: referencing the subscribed raffles in the user document or having a separate collection with raffles and subscription counts.

Alternative maybe?:

The subscribers array is a list of strings representing the user's ID. This way adding a subscriber is as simple as pushing a new value. The other option is to have an array of objects like this and incrementing the count:

{
  "subscribers: [
     {
       "id": objectId  //User id
       "count": integer //Number of subscriptions
     },
     ...
  ]
}

Expected result:

The expected result is to have the full raffle document + an additional value of how many subscriptions a given user has.

{
    "_id" : objectId,
    "name" : string,
    "ends_at" : ISODate/Timestamp,
     ...
    "subscriptions" : 3 //Number of entries for a given user
}

Current solution

I'm getting the size after filtering the nested array with a given user id < USER_ID >

db.raffles.aggregate([
    ...
    {
        $project: {
            "name" : 1,
            "ends_at" :1,
             ...
            "subscriptions" : {
                 $size : {
                    $filter : {
                        input: "$subscribers",
                        as: "user",
                        cond: {
                            $eq: ["$$user", <USER_ID>]
                        },
                    }
                 }
            }
        }
    }
    ...
])

Questions:

  • Are there other/better ways to accomplish the result from the current solution? Maybe grouping and summing up or map/reduce?

  • Is it worth keeping not only the user ids, but rather objects with user id and subscription count?

  • The current solution will throw an error if the subscriptions array is not set. Is there a way to handle this?

Thank you very much for the time spent reading this long post!

2
  • 1
    One alternative would be to $unwind: "$subscribers", then $match: { "subscribers": <USER_ID> } and finally a $group but I doubt it would be faster. Commented Oct 25, 2018 at 11:47
  • This is exactly how I approached this initially. The thing is that after the grouping stage where I can compute the subscriptions count I'll have to store all the other data in the _id field (or at least my skills only allow me this). This will make it harder to map/hydrate the data to an entity later. Commented Oct 25, 2018 at 13:02

1 Answer 1

1

I would keep both user id and count in the subscription array and increment the count when you have a match with user id.

Something like

db.Raffles.update({"subscriptions.id":userid}, {$inc:{"subscriptions.$.count":1}}})

You can access the code using below query.

db.Raffles.find({"subscriptions.id":userid},{"name":1,"ends_at":1,"subscriptions.$":1});
Sign up to request clarification or add additional context in comments.

2 Comments

That's smooth, @Veeram. I think I start to prefer this solution and keep the data in separate objects with counter. I just have two problems here: 1. How to handle the case where the nested array doesn't have a record for this user. 2. When selecting the list of raffles, I don't want only the ones the user is subscribed to. There are other filters involved, but let's say that I'll be selecting all the documents. In this case to get the matching object from the array of subscribers I'll need to $filter trough them again. Or is there a better way?
For 1 : Mongo query returns null when no user is found. For 2 : You've to use $filter when you're expecting multiple matching documents. Let me know if you have more questions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.