0

I have a collection with >100,000 documents that contain multiple nested arrays. I need to Query based on a property located at the lowest level and return just the object at the bottom of the arrays.

Document Structure:

    {
    _id: 12345,
    type: "employee",
    people: [
        {
            name: "Rob",
            items: [
                {
                    itemName: "RobsItemOne",
                    value: "$10.00",
                    description: "some description about the item"
                },
                {
                    itemName: "RobsItemTwo",
                    value: "$15.00",
                    description: "some description about the item"
                }
            ]
        }
    ]
}

I have been using the Aggregation Pipeline to get the expected results which DOES WORK, however the performance is pretty terrible. Here is my query:

db.collection.aggregate([
            {
                $match: {
                    "type": "employee"
                }
            },

            {$unwind: "$people"},
            {$unwind: "$people.items"},
            {$match: {$or: [ //There could be dozens of items included in this $match
                             {"people.items.itemName": "RobsItemOne"},
                             {"people.items.itemName": "RobsItemTwo"}
                           ]
                     }
            },
            {
                $project: {
                    _id: 0,// This is because of the $out
                    systemID: "$_id",
                    type: "$type",
                    item: "$people.items.itemName",
                    value: "$people.items.value"
                }
            },
            {$out: tempCollection} //Would like to avoid this, but was exceeding max document size
        ])

The Results are :

[ 
    {
        "type" : "employee",
        "systemID" : 12345,
        "item" : "RobsItemOne",
        "value" : "$10.00"
    }, 
    {
        "type" : "employee",
        "systemID" : 12345,
        "item" : "RobsItemTwo",
        "value" : "$10.00"
    }
]

What can I do to make this query faster? I have tried using indexes but per Mongo docs, indexes past the initial $match are ignored.

2 Answers 2

0

What you can additionally try, is to add a $match operator to your query just after you $unwind people.

...{$unwind: "$people"},
{$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
{$unwind: "$people.items"}, ....

This will bring down the number of records to be queried by the following $unwind and $match operators.

Since you have a large number of records, you could make use of the {allowDiskUse:true} option.which,

Enables writing to temporary files. When set to true, aggregation stages can write data to the _tmp subdirectory in the dbPath directory.

So, your final query would like like:

db.collection.aggregate([
        {
            $match: {
                "type": "employee"
            }
        },

        {$unwind: "$people"},
        {$match:{"people.items.itemName":{$in:["RobsItemOne","RobsItemTwo"]}}},
        {$unwind: "$people.items"},
        {$match: {$or: [ //There could be dozens of items included in this $match
                         {"people.items.itemName": "RobsItemOne"},
                         {"people.items.itemName": "RobsItemTwo"}
                       ]
                 }
        },
        {
            $project: {
                _id: 0,// This is because of the $out
                systemID: "$_id",
                type: "$type",
                item: "$people.items.itemName",
                value: "$people.items.value"
            }
        }

    ], {allowDiskUse:true})
Sign up to request clarification or add additional context in comments.

3 Comments

I will give that a shot. Is Aggregation pipeline instead of Map Reduce the proper choice here?
Please have a look at this :stackoverflow.com/questions/16310730/…
In the above example, all the documents have unique keys, so the reduce function would not be called for all the documents. Even if you emit a common key to all the documents, the reduce function would have to take a lot of documents as input, the processing would be much slower than the aggregation pipeline since the pipeline would eliminate documents in the $match stage.
0

I find there are something else can be strived to improve after @BatScream's efforts. You can just have a try.

// if the final result set is relatively small, this index will be helpful.
db.collection.ensureIndex({type : 1, "people.items.itemName" : 1 });

var itemCriteria = {
    $in : [ "RobsItemOne", "RobsItemTwo" ]
};

db.collection.aggregate([ {
    $match : {
        "type" : "employee",
        "people.items.itemName" : itemCriteria      // add this criteria to narrow source range further
    }
}, {
    $unwind : "$people"
}, {
    $match : {
        "people.items.itemName" : itemCriteria      // narrow data range further
    }
}, {
    $unwind : "$people.items"
}, {
    $match : {
        "people.items.itemName" : itemCriteria      // final match, avoid to use $or operator
    }
}, {
    $project : {
        _id : 0,                                    // This is because of the $out
        systemID : "$_id",
        type : "$type",
        item : "$people.items.itemName",
        value : "$people.items.value"
    }
}, {
    $out: tempCollection                            // optional
} ], {
    allowDiskUse : true
});

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.