3

So I want to use pipeline aggregation in MongoDB to query certain values from documents and then add them together.

My "Albums" document.

{
"_id" : ObjectId("5875ed1dc939408da0601f31"),
"AlbumName" : "Blurryface",
"Artist" : "21 Pilots",
"Date" : "20151110",
"Label" : "Fueled By Ramen",
"Writers" : "Tyler Joseph",
"Producer" : "Mike Elizondo",
"Songlist" : [ 
    {
        "_id" : ObjectId("5875e5e8c939408da0601d73"),
        "SongID" : "1",
        "SongName" : "Stressed Out",
        "Artist" : "21 Pilots",
        "Album" : "Blurryface",
        "Duration:" : "200",
        "nPlays" : 800000000,
        "SongDataFile" : "data"
    }, 
    {
        "_id" : ObjectId("5875e855c939408da0601dcc"),
        "SongID" : "4",
        "SongName" : "Heathens",
        "Artist" : "21 Pilots",
        "Album" : "Blurryface",
        "Colaborator" : "NA",
        "Duration:" : "320",
        "nPlays" : 5000000,
        "SongDataFile" : "data"
    }
]
}

How can I make an aggregation pipeline that extracts the "nPlays" from the songs in the array and then add them together?

I'm asking here since the documentation on MongoDB is subpar and they have no examples of how to use the operators together. Add to this that all examples on google only query for $gt $lt or use the same example that only uses $match and $group which doesn't help me with my problem at all.

In short:

How do I extract "nPlays" and add them together in a pipeline aggregation?

1
  • 1
    What's your MongoDB version? Commented Jan 12, 2017 at 8:24

3 Answers 3

5

You have to unwind the internal documents. This operation creates a document for each subdocument in Songlist field.

The resulting aggregation pipeline is the following:

db.Albums.aggregate([
  {$unwind: {path: "$Songlist"}},
  {$project : { "_id" : 0, "AlbumName" : 1, "Songlist.nPlays" : 1} },
  {$group : {"_id" : "$AlbumName", "sum" : {"$sum" : "$Songlist.nPlays"}}}
])

The result document is this:

{
  "_id" : "Blurryface",
  "sum" : 805000000
}

In summary, with the $unwind operation you flatten inner subdocuments. Then, with a simple $project you can retain only the fields you need (this stage is optional). Finally, using a $group, you can sum over the information you need.

Hope it helps.

Sign up to request clarification or add additional context in comments.

3 Comments

Ah, alright. I knew about the $unwind, but i didn't know that you could use $group and then $sum on a specified group attribute. Thank you very much!
Follow up question if you have time. Say i have many albums in my collection "Albums". How do I then group the total nPlays for each album and compare that, and then $limit to the most played album? (Which album has greatest nTotalPlays?
Yeah, silly me. I tried adding another album, it does indeed! Sorry for asking so many other questions, but how do I then limit the result set to only one result. (The biggest). I know that I should use $limit, but what syntax do I use at the end after $group?
3

For the most efficient solution which does not need multiple pipelines, I would suggest bumping your MongoDB server to 3.4 (if using earlier versions), and use the new $reduce array operator to add the fields' values in the Songlist array in a seamless manner.

It calculates the sum of the "Songlist.nPlays" fields in the array by applying an expression to each element in an array and combining them into a single value.

You can then use this as an expression with the $addFields pipeline to get the desired field along with the other fields:

db.collection.aggregate([
    { 
        "$addFields": { 
            "totalPlayDuration": {
                "$reduce": {
                    "input": "$Songlist",
                    "initialValue": 0,
                    "in": { "$add": ["$$value", "$$this.nPlays"] }
                }
            }
        }
    }           
])

Sample Output

/* 1 */
{
    "_id" : ObjectId("5875ed1dc939408da0601f31"),
    "AlbumName" : "Blurryface",
    "Artist" : "21 Pilots",
    "Date" : "20151110",
    "Label" : "Fueled By Ramen",
    "Writers" : "Tyler Joseph",
    "Producer" : "Mike Elizondo",
    "Songlist" : [ 
        {
            "_id" : ObjectId("5875e5e8c939408da0601d73"),
            "SongID" : "1",
            "SongName" : "Stressed Out",
            "Artist" : "21 Pilots",
            "Album" : "Blurryface",
            "Duration:" : "200",
            "nPlays" : 800000000,
            "SongDataFile" : "data"
        }, 
        {
            "_id" : ObjectId("5875e855c939408da0601dcc"),
            "SongID" : "4",
            "SongName" : "Heathens",
            "Artist" : "21 Pilots",
            "Album" : "Blurryface",
            "Colaborator" : "NA",
            "Duration:" : "320",
            "nPlays" : 5000000,
            "SongDataFile" : "data"
        }
    ],
    "totalPlayDuration": 805000000
}

NB: A solution that uses $unwind may not be as efficient at scale and expect drop in performance when dealing with large arrays because it produces a cartesian product of the documents i.e. a copy of each document per array entry, which uses more memory (possible memory cap on aggregation pipelines of 10% total memory) and therefore takes time to produce as well processing the documents during the flattening process.

Also, a multiple pipeline solution requires knowledge of the document fields since this is needed in the $group pipeline where you retain the fields in the grouping process by using the accumulators like $first or $last. That can be a huge limitation if your query needs to be dynamic. So in essence it would be more beneficial to take advantage of the new operators found in MongoDB versions 3.4 and above which offer improved aggregation pipeline performance.

2 Comments

imho you're not giving a solution to the original use case. All the information you gave are extremely usefull and correct, but they are not the answer to the original question.
Thanks for your feedback. The OP's question is How can I make an aggregation pipeline that extracts the "nPlays" from the songs in the array and then add them together? and the above answer addresses that through the use of $reduce which extracts the nPlays field without the need for $unwind which can be expensive computationally given large arrays. Now I am not sure how this answer doesn't fit/solve the OP's question as you imply, care to show me where I went wrong?
1

You can use aggregate with $group. This will give total for all records.

db.collectionName.aggregate([
   {$unwind: '$Songlist'},
    {$group: {_id: null, sum: {$sum: '$Songlist.nPlays'}}}
])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.