MongoDB aggregate nested array correctly

Question

OK I am very new to Mongo, and I am already stuck.

Db has the following structure (much simplified for sure):

{
    {
        "_id" : ObjectId("57fdfbc12dc30a46507044ec"),

        "keyterms" : [ 
            {
                "score" : "2",
                "value" : "AA",
            }, 
            {
                "score" : "2",
                "value" : "AA",
            }, 
            {
                "score" : "4",
                "value" : "BB",
            },
            {
                "score" : "3",
                "value" : "CC",
            }
        ]
    },

    {
        "_id" : ObjectId("57fdfbc12dc30a46507044ef"),

        "keyterms" : [ 
        ...

There are some Objects. Each Object have an array "keywords". Each of this Arrays Entries, which have score and value. There are some duplicates though (not really, since in the real db the keywords entries have much more fields, but concerning value and score they are duplicates).

Now I need a query, which

selects one object by id
groups its keyterms in by value
and counts the dublicates
sorts them by score

So I want to have something like that as result

// for Object 57fdfbc12dc30a46507044ec
"keyterms"; [
    {
        "score" : "4",
        "value" : "BB",
        "count" : 1
    },


    {
        "score" : "3",
        "value" : "CC",
        "count" : 1
    }

    {
        "score" : "2",
        "value" : "AA",
        "count" : 2
    }

]

In SQL I would have written something like this

select 
    score, value, count(*) as count
from
    all_keywords_table_or_some_join
group by
    value
order by
    score

But, sadly enough, it's not SQL.

In Mongo I managed to write this:

db.getCollection('tests').aggregate([
    {$match: {'_id': ObjectId('57fdfbc12dc30a46507044ec')}},
    {$unwind: "$keyterms"}, 
    {$sort: {"keyterms.score": -1}}, 
    {$group: {
        '_id': "$_id", 
        'keyterms': {$push: "$keyterms"}
    }},
    {$project: {
        'keyterms.score': 1,
        'keyterms.value': 1
    }}
])

But there is something missing: the grouping of the the keywords by their value. I can not get rid of the feeling, that this is the wrong approach at all. How can I select the keywords array and continue with that, and use an aggregate function inly on this - that would be easy.

BTW I read this (Mongo aggregate nested array) but I can't figure it out for my example unfortunately...

chridam · Accepted Answer · 2016-10-12 15:21:06Z

4

You'd want an aggregation pipeline where after you $unwind the array, you group the flattened documents by the array's value and score keys, aggregate the counts using the $sum accumulator operator and retain the main document's _id with the $first operator.

The preceding pipeline should then group the documents from the previous pipeline by the _id key so as to preserve the original schema and recreate the keyterms array using the $push operator.

The following demonstration attempts to explain the above aggregation operation:

db.tests.aggregate([
    { "$match": { "_id": ObjectId("57fdfbc12dc30a46507044ec") } },
    { "$unwind": "$keyterms" },
    {
        "$group": {
            "_id": {
                "value": "$keyterms.value",
                "score": "$keyterms.score"
            },
            "doc_id": { "$first": "$_id" },
            "count": { "$sum": 1 }
        }
    },
    { "$sort": {"_id.score": -1 } },
    {
        "$group": {
            "_id": "$doc_id",
            "keyterms": {
                "$push": {
                    "value": "$_id.value",
                    "score": "$_id.score",
                    "count": "$count"
                }
            }
        }
    }
])

Sample Output

{
    "_id" : ObjectId("57fdfbc12dc30a46507044ec"),
    "keyterms" : [ 
        {
            "value" : "BB",
            "score" : "4",
            "count" : 1
        }, 
        {
            "value" : "CC",
            "score" : "3",
            "count" : 1
        }, 
        {
            "value" : "AA",
            "score" : "2",
            "count" : 2
        }
    ]
}

Demo

edited Oct 12, 2016 at 15:21

answered Oct 12, 2016 at 14:32

chridam

104k26 gold badges246 silver badges243 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Paflow Over a year ago

I tested your solution again and realized, it does not do the right thing: there are still two entries for the keyterm "AA" in the result!

chridam Over a year ago

@Paflow I don't see the duplicates as I've tested the pipeline. The initial $group pipeline step handles this, I'm failing to see where you are getting two entries for "AA". Perhaps you could update your question with some sample documents to verify that with?

Paflow Over a year ago

Sorry my mistake. I did a mistake in re-morphing it back to my real document format. Thank you anyway

Paflow · Accepted Answer · 2016-10-12 14:32:26Z

1

Meanwhile, I solved it myself:

aggregate([
        {$match: {'_id': ObjectId('57fdfbc12dc30a46507044ec')}},
        {$unwind: "$keyterms"},
        {$sort: {"keyterms.score": -1}}, 
        {$group: {
            '_id': "$keyterms.value", 
            'keyterms': {$push: "$keyterms"},
            'escore': {$first: "$keyterms.score"},
            'evalue': {$first: "$keyterms.value"}
        }},
        {$limit: 15},
        {$project: {
          "score": "$escore", 
          "value": "$evalue",
          "count": {$size: "$keyterms"}
        }}      
])

answered Oct 12, 2016 at 14:32

Paflow

2,3975 gold badges34 silver badges58 bronze badges

Collectives™ on Stack Overflow

MongoDB aggregate nested array correctly

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related