2

Reading the docs, I see you can get the number of elements in document arrays. For example given the following documents:

{ "_id" : 1, "item" : "ABC1", "description" : "product 1", colors: [ "blue", "black", "red" ] }
{ "_id" : 2, "item" : "ABC2", "description" : "product 2", colors: [ "purple" ] }
{ "_id" : 3, "item" : "XYZ1", "description" : "product 3", colors: [ ] }

and the following query:

db.inventory.aggregate([{$project: {item: 1, numberOfColors: { $size: "$colors" }}}])

We would get the number of elements in each document's colors array:

{ "_id" : 1, "item" : "ABC1", "numberOfColors" : 3 }
{ "_id" : 2, "item" : "ABC2", "numberOfColors" : 1 }
{ "_id" : 3, "item" : "XYZ1", "numberOfColors" : 0 }

I've not been able to figure out if and how you could sum up all the colors in all the documents directly from a query, ie:

{ "totalColors": 4 }

2 Answers 2

2

You can use the following query to get the count of all colors in all docs:

db.inventory.aggregate([ 
  { $unwind: '$colors' } , // expands nested array so we have one doc per each array value 
  { $group: {_id: null, allColors: {$addToSet: "$colors"} } }  , // find all colors
  { $project: { totalColors: {$size: "$allColors"}}} // find count of all colors
 ])
Sign up to request clarification or add additional context in comments.

Comments

2

Infinitely better is is to simply $sum the $size:

db.inventory.aggregate([
  { "$group": { "_id": null, "totalColors": { "$sum": { "$size": "$colors" } } }
])

If you wanted "distinct in each document" then you would instead:

db.inventory.aggregate([
  { "$group": {
    "_id": null,
    "totalColors": {
      "$sum": {
        "$size": { "$setUnion": [ [], "$colors" ] }
      }
    }
  }}
])

Where $setUnion takes values likes ["purple","blue","purple"] and makes it into ["purple","blue"] as a "set" with "distinct items".

And if you really want "distinct across documents" then don't accumulate the "distinct" into a single document. That causes performance issues and simply does not scale to large data sets, and can possibly break the 16MB BSON Limit. Instead accumulate naturally via the key:

db.inventory.aggregate([
  { "$unwind": "$colors" },
  { "$group": { "_id": "$colors" } },
  { "$group": { "_id": null, "totalColors": { "$sum": 1 } } }
])

Where you only use $unwind because you want "distinct" values from the array as combined with other documents. Generally $unwind should be avoided unless the value contained in the array is being accessed in the "grouping key" _id of $group. Where it is not, it is better to treat arrays using other operators instead, since $unwind creates a "copy" of the whole document per array element.

And of course there was also nothing wrong with simply using .distinct() here, which will return the "distinct" values "as an array", for which you can just test the Array.length() on in code:

var totalSize = db.inventory.distinct("colors").length;

Which for the simple operation you are asking, would be the overall fastest approach for a simple "count of distinct elements". Of course the limitation remains that the result cannot exceed the 16MB BSON limit as a payload. Which is where you defer to .aggregate() instead.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.