2

I would like to sort a nested array at query time while also projecting all fields in the document.

Example document:

{ "_id" : 0, "unknown_field" : "foo", "array_to_sort" : [ { "a" : 3, "b" : 4 }, { "a" : 3, "b" : 3 }, { "a" : 1, "b" : 0 } ] }

I can perform the sorting with an aggregation but I cannot preserve all the fields I need. The application does not know at query time what other fields may appear in each document, so I am not able to explicitly project them. If I had a wildcard to project all fields then this would work:

db.c.aggregate([
    {$unwind: "$array_to_sort"},
    {$sort: {"array_to_sort.b":1, "array_to_sort:a": 1}},
    {$group: {_id:"$_id", array_to_sort: {$push:"$array_to_sort"}}}
]);

...but unfortunately, it produces a result that does not contain the "unknown_field":

    {
        "_id" : 0,
        "array_to_sort" : [
            {
                "a" : 1,
                "b" : 0
            },
            {
                "a" : 3,
                "b" : 3
            },
            {
                "a" : 3,
                "b" : 4
            }
        ]
    }

Here is the insert command incase you would like to experiment:

db.c.insert({"unknown_field": "foo", "array_to_sort": [{"a": 3, "b": 4}, {"a": 3, "b":3}, {"a": 1, "b":0}]})

I cannot pre-sort the array because the sort criteria is dynamic. I may be sorting by any combination of a and/or b ascending/descending at query time. I realize I may need to do this in my client application, but it would be sweet if I could do it in mongo because then I could also $slice/skip/limit the results for paging instead of retrieving the entire array every time.

1 Answer 1

2

Since you are grouping on the document _id you can simply place the fields you wish to keep within the grouping _id. Then you can re-form using $project

db.c.aggregate([
    { "$unwind": "$array_to_sort"},
    { "$sort": {"array_to_sort.b":1, "array_to_sort:a": 1}},
    { "$group": { 
        "_id": {
            "_id": "$_id",
            "unknown_field": "$unknown_field"
        },
        "Oarray_to_sort": { "$push":"$array_to_sort"}
    }},
    { "$project": {
        "_id": "$_id._id",
        "unknown_field": "$_id.unknown_field",
        "array_to_sort": "$Oarray_to_sort"
    }}
]);

The other "trick" in there is using a temporary name for the array in the grouping stage. This is so when you $project and change the name, you get the fields in the order specified in the projection statement. If you did not, then the "array_to_sort" field would not be the last field in the order, as it is copied from the prior stage.

That is an intended optimization in $project, but if you want the order then you can do it as above.


For completely unknown structures there is the mapReduce way of doing things:

db.c.mapReduce(
    function () {
        this["array_to_sort"].sort(function(a,b) {
            return a.a - b.a || a.b - b.b;
        });

        emit( this._id, this );
    },
    function(){},
    { "out": { "inline": 1 } }
)

Of course that has an output format that is specific to mapReduce and therefore not exactly the document you had, but all the fields are contained under "values":

{
    "results" : [
            {
                    "_id" : 0,
                    "value" : {
                            "_id" : 0,
                            "some_field" : "a",
                            "array_to_sort" : [
                                    {
                                            "a" : 1,
                                            "b" : 0
                                    },
                                    {
                                            "a" : 3,
                                            "b" : 3
                                    },
                                    {
                                            "a" : 3,
                                            "b" : 4
                                    }
                            ]
                    }
            }
    ],
}

Future releases ( as of writing ) allow you to use a $$ROOT variable in aggregate to represent the document:

db.c.aggregate([
    { "$project": {
        "_id": "$$ROOT",
        "array_to_sort": "$array_to_sort"
    }},
    { "$unwind": "$array_to_sort"},
    { "$sort": {"array_to_sort.b":1, "array_to_sort:a": 1}},
    { "$group": { 
        "_id": "$_id",
        "array_to_sort": { "$push":"$array_to_sort"}
    }}
]);

So there is no point there using the final "project" stage as you do not actually know the other fields in the document. But they will all be contained (including the original array and order ) within the _id field of the result document.

Sign up to request clarification or add additional context in comments.

6 Comments

@Neil_Lunn Thanks. The application does not know the name of the unknown_field at query time. I'm looking for a way to catch all of them - like a wildcard or include all type thing.
@BobB From MongoDB version 2.6 (which is currently a release candidate ) there is a "$$ROOT" variable that can be used to copy the whole document in it's form at a given pipeline stage. But you will not get the document back in the same form if you do not know the fields. The other option is to do this with "mapReduce" and arbitrary JavaScript. You would actually be better off re-considering your schema design to have at least some level of "uniform" behavior.
@BobB Added the mapReduce and future aggregate methods for reference.
@BobB perhaps you missed the comment that indicated the answer had been edited.
@Neil_Lunn I am unable to test the $$ROOT feature but this sounds like it would do what I need.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.