0

I have a document which describes counts of different things observed by a camera within a 15 minute period. It looks like this:

{
    "_id" : ObjectId("5b1a709a83552d002516ac19"),
    "start" : ISODate("2018-06-08T11:45:00.000Z"),
    "end" : ISODate("2018-06-08T12:00:00.000Z"),
    "recording" : ObjectId("5b1a654683552d002516ac16"),
    "data" : {
        "counts" : {
            "5b434d05da1f0e00252566be" : 12,
            "5b434d05da1f0e00252566cc" : 4,
            "5b434d05da1f0e00252566ca" : 1
        }
    }
}

The keys inside the data.counts object change with each document and refer to additional data that is fetched at a later date. There are unlimited number of keys inside data.counts (but usually about 20)

I am trying to aggregate all these 15 minute documents up to daily aggregated documents.

I have this query at the moment to do that:

db.getCollection("segments").aggregate([
    {$match:{
       "recording": ObjectId("5bf7f68ad8293a00261dd83f")
    }}, 
    {$project:{
        "start": 1,
        "recording": 1,
        "data": 1
    }},
    {$group:{
        _id: { $dateToString: { format: "%Y-%m-%d", date: "$start" } },
        "segments": { $push: "$$ROOT" }
    }},
    {$sort: {_id: -1}},
]);

This does the grouping and returns all the segments in an array.

I want to also aggregate the information inside data.counts, so that I get the sum of values for all keys that are the same within the daily group.

This would save me from having another service loop through each 15 minute segment summing values with the same keys. E.g. the query would return something like this:

{
    "_id" : "2019-02-27",
    "counts" : {
        "5b434d05da1f0e00252566be" : 351,
        "5b434d05da1f0e00252566cc" : 194,
        "5b434d05da1f0e00252566ca" : 111
        ... any other keys that were found within a day
    }
}

How might I amend the query I already have, or use a different query?

Thanks!

1 Answer 1

1

You could use the $facet pipeline stage to create two sub-pipelines; one for segments and another for counts. These sub-pipelines can be joined by using $zip to stitch them together and $map to merge each 2-element array produced from zip. Note this will only work correctly if the sub-pipelines output sorted arrays of the same size, which is why we group and sort by start_date in each sub-pipeline.

Here's the query:

db.getCollection("segments").aggregate([{
    $match: {
        recording: ObjectId("5b1a654683552d002516ac16")
    }
}, {
    $project: {
        start: 1,
        recording: 1,
        data: 1,
        start_date: { $dateToString: { format: "%Y-%m-%d", date: "$start" }}
    }
}, {
    $facet: {
        segments_pipeline: [{
            $group: {
                _id: "$start_date",
                segments: {
                    $push: {
                        start: "$start",
                        recording: "$recording",
                        data: "$data"
                    }
                }
            }
        }, {
            $sort: {
                _id: -1
            }
        }],
        counts_pipeline: [{
            $project: {
                start_date: "$start_date",
                count: { $objectToArray: "$data.counts" }
            }
        }, {
            $unwind: "$count"
        }, {
            $group: {
                _id: {
                    start_date: "$start_date",
                    count_id: "$count.k"
                },
                count_sum: { $sum: "$count.v" }
            }
        }, {
            $group: {
                _id: "$_id.start_date",
                counts: {
                    $push: {
                        $arrayToObject: [[{
                            k: "$_id.count_id",
                            v: "$count_sum"
                        }]]
                    }
                }
            }
        }, {
            $project: {
                counts: { $mergeObjects: "$counts" }
            }
        }, {
            $sort: {
                _id: -1
            }
        }]
    }
}, {
    $project: {
        result: {
            $map: {
                input: { $zip: { inputs: ["$segments_pipeline", "$counts_pipeline"] }},
                in: { $mergeObjects: "$$this" }
            }
        }
    }
}, {
    $unwind: "$result"
}, {
    $replaceRoot: {
        newRoot: "$result"
    }
}])

Try it out here: Mongoplayground.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.