15

How flexible is the aggregate function for output formatting in MongoDB?

Data format:

{
        "_id" : ObjectId("506ddd1900a47d802702a904"),
        "port_name" : "CL1-A",
        "metric" : "772.0",
        "port_number" : "0",
        "datetime" : ISODate("2012-10-03T14:03:00Z"),
        "array_serial" : "12345"
}

Right now I'm using this aggregate function to return an array of DateTime, an array of metrics, and a count:

{$match : { 'array_serial' : array, 
                            'port_name' : { $in : ports},
                            'datetime' : { $gte : from, $lte : to}
                        }
                },
               {$project : { port_name : 1, metric : 1, datetime: 1}},
               {$group : { _id : "$port_name", 
                            datetime : { $push : "$datetime"},
                            metric : { $push : "$metric"},
                            count : { $sum : 1}}}

Which is nice, and very fast, but is there a way to format the output so there's one array per datetime/metric? Like this:

[
    {
      "_id" : "portname",
      "data" : [
                ["2012-10-01T00:00:00.000Z", 1421.01],
                ["2012-10-01T00:01:00.000Z", 1361.01],
                ["2012-10-01T00:02:00.000Z", 1221.01]
               ]
    }
]

This would greatly simplify the front-end as that's the format the chart code expects.

1
  • In the mean time I'm getting the output and looping through the objects and using underscore's zip function to combine them, this doesn't seem to add much overhead. Commented Oct 8, 2012 at 18:30

4 Answers 4

17

Combining two fields into an array of values with the Aggregation Framework is possible, but definitely isn't as straightforward as it could be (at least as at MongoDB 2.2.0).

Here is an example:

db.metrics.aggregate(

    // Find matching documents first (can take advantage of index)
    { $match : {
        'array_serial' : array, 
        'port_name' : { $in : ports},
        'datetime' : { $gte : from, $lte : to}
    }},

    // Project desired fields and add an extra $index for # of array elements
    { $project: {
        port_name: 1,
        datetime: 1,
        metric: 1,
        index: { $const:[0,1] }
    }},

    // Split into document stream based on $index
    { $unwind: '$index' },

    // Re-group data using conditional to create array [$datetime, $metric]
    { $group: {
        _id: { id: '$_id', port_name: '$port_name' },
        data: {
            $push: { $cond:[ {$eq:['$index', 0]}, '$datetime', '$metric'] }
        },
    }},

    // Sort results
    { $sort: { _id:1 } },

    // Final group by port_name with data array and count
    { $group: {
        _id: '$_id.port_name',
        data: { $push: '$data' },
        count: { $sum: 1 }
    }}
)
Sign up to request clarification or add additional context in comments.

4 Comments

Ah! I didn't know $group could be called more than once. I'll give this a try, thanks!
What does '$const' do exactly? It doesn't seem to be documented.
Turns out that $const is an internal implementation detail for serialization between mongos and mongod and is not meant to be (ab)used by end user queries (see jira.mongodb.org/browse/SERVER-6769). In particular this may not work properly through a mongos. At the time I didn't realize it wasn't a documented expression, and I'd seen it (ab)used to add constants to documents as I've done here. I'll try to revisit this answer after MongoDB 2.4.0 is released, as there may be an alternative (and documented) approach.
Please vote & watch the MongoDB feature request SERVER-8141, which proposes adding an $array aggregation expression :).
2

MongoDB 2.6 made this a lot easier by introducing $map, which allows a simplier form of array transposition:

db.metrics.aggregate([
   { "$match": {
       "array_serial": array, 
       "port_name": { "$in": ports},
       "datetime": { "$gte": from, "$lte": to }
    }},
    { "$group": {
        "_id": "$port_name",
        "data": {
            "$push": {
                "$map": {
                    "input": [0,1],
                    "as": "index",
                    "in": {
                        "$cond": [
                            { "$eq": [ "$$index", 0 ] },
                            "$datetime",
                            "$metric"
                        ]
                    }
                }
            }
        },
        "count": { "$sum": 1 }
    }}
])

Where much like the approach with $unwind, you supply an array as "input" to the map operation consisting of two values and then essentially replace those values with the field values you want via the $cond operation.

This actually removes all the pipeline juggling required to transform the document as was required in previous releases and just leaves the actual aggregation to the job at hand, which is basically accumulating per "port_name" value, and the transformation to array is no longer a problem area.

Comments

1

Building arrays in the aggregation framework without $push and $addToSet is something that seems to be lacking. I've tried to get this to work before, and failed. It would be awesome if you could just do:

data : {$push: [$datetime, $metric]}

in the $group, but that doesn't work.

Also, building "literal" objects like this doesn't work:

data : {$push: {literal:[$datetime, $metric]}}
or even data : {$push: {literal:$datetime}}

I hope they eventually come up with some better ways of massaging this sort of data.

1 Comment

These are the exact methods I tried, I was just assuming it would work. I guess not :(
0

The following isn't conditional, but easier to understand.

{"_id":"$city","doc":{"$push":"$$ROOT"}}

1 Comment

This solution is useful on small data, but if data grows the $push array will be too big. You may want to see this solution stackoverflow.com/a/22935461/11322237 or use $facet mongodb.com/docs/manual/reference/operator/aggregation/facet or $group with $topN jira.mongodb.org/browse/SERVER-9377

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.