Problem:
I'm trying to get a list of documents and for each one of them to calculate the number of occurrences of a given value in a nested array of the same document.
I have a working example using the aggregation framework, but I wonder if there is a better way to accomplish the same thing, so I can benchmark the different approaches.
Simplified Data Model:
Single document in collection "Raffles":
{
"_id" : objectId,
"name" : string,
"ends_at" : ISODate/Timestamp,
...
"subscribers" : string[] //List of user ids
}
- The collections consists of documents representing a raffle/sweepstake with a name and start/ending dates.
- Users can subscribe to a raffle.
- Users can subscribe multiple times for the same raffle.
95% of the read queries will require both the raffle data like name, description and dates + information about the subscribed users. That's why I decided to have everything in a single raffle document instead of: referencing the subscribed raffles in the user document or having a separate collection with raffles and subscription counts.
Alternative maybe?:
The subscribers array is a list of strings representing the user's ID. This way adding a subscriber is as simple as pushing a new value. The other option is to have an array of objects like this and incrementing the count:
{
"subscribers: [
{
"id": objectId //User id
"count": integer //Number of subscriptions
},
...
]
}
Expected result:
The expected result is to have the full raffle document + an additional value of how many subscriptions a given user has.
{
"_id" : objectId,
"name" : string,
"ends_at" : ISODate/Timestamp,
...
"subscriptions" : 3 //Number of entries for a given user
}
Current solution
I'm getting the size after filtering the nested array with a given user id < USER_ID >
db.raffles.aggregate([
...
{
$project: {
"name" : 1,
"ends_at" :1,
...
"subscriptions" : {
$size : {
$filter : {
input: "$subscribers",
as: "user",
cond: {
$eq: ["$$user", <USER_ID>]
},
}
}
}
}
}
...
])
Questions:
Are there other/better ways to accomplish the result from the current solution? Maybe grouping and summing up or map/reduce?
Is it worth keeping not only the user ids, but rather objects with user id and subscription count?
The current solution will throw an error if the
subscriptionsarray is not set. Is there a way to handle this?
Thank you very much for the time spent reading this long post!
$unwind: "$subscribers", then$match: { "subscribers": <USER_ID> }and finally a$groupbut I doubt it would be faster.subscriptionscount I'll have to store all the other data in the_idfield (or at least my skills only allow me this). This will make it harder to map/hydrate the data to an entity later.