MongoDB aggregation query

Question

I am using mongoDb 2.6.4 and still getting an error:

uncaught exception: aggregate failed: {
    "errmsg" : "exception: aggregation result exceeds maximum document size (16MB)",
    "code" : 16389,
    "ok" : 0,
    "$gleStats" : {
        "lastOpTime" : Timestamp(1422033698000, 105),
        "electionId" : ObjectId("542c2900de1d817b13c8d339")
    }
}

Reading different advices I came across of saving result in another collection using $out. My query looks like this now:

db.audit.aggregate([
{$match: { "date": { $gte : ISODate("2015-01-22T00:00:00.000Z"),
                    $lt : ISODate("2015-01-23T00:00:00.000Z")
                    }
                }
            },

{ $unwind : "$data.items" } ,
{
$out : "tmp"
}] 
)

But I am getting different error: uncaught exception: aggregate failed:

{"errmsg" : "exception: insert for $out failed: { lastOp: Timestamp 1422034172000|25, connectionId: 625789, err: \"insertDocument :: caused by :: 11000 E11000 duplicate key error index: duties_and_taxes.tmp.agg_out.5.$_id_  dup key: { : ObjectId('54c12d784c1b2a767b...\", code: 11000, n: 0, ok: 1.0, $gleStats: { lastOpTime: Timestamp 1422034172000|25, electionId: ObjectId('542c2900de1d817b13c8d339') } }",
    "code" : 16996,
    "ok" : 0,
    "$gleStats" : {
        "lastOpTime" : Timestamp(1422034172000, 26),
        "electionId" : ObjectId("542c2900de1d817b13c8d339")
    }
}

Can someone has a solution?

BatScream · Accepted Answer · 2015-01-23 18:16:50Z

3

The error is due to the $unwind step in your pipeline.

When you unwind by a field having n elements, n copies of the same documents are produced with the same _id. Each copy having one of the elements from the array that was used to unwind. See the below demonstration of the records after an unwind operation.

Sample demo:

> db.t.insert({"a":[1,2,3,4]})

WriteResult({ "nInserted" : 1 })

> db.t.aggregate([{$unwind:"$a"}])

{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 1 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 2 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 3 }
{ "_id" : ObjectId("54c28dbe8bc2dadf41e56011"), "a" : 4 }
>

Since all these documents have the same _id, you get a duplicate key exception(due to the same value in the _id field for all the un-winded documents) on insert into a new collection named tmp.

The pipeline will fail to complete if the documents produced by the pipeline would violate any unique indexes, including the index on the _id field of the original output collection.

To solve your original problem, you could set the allowDiskUse option to true. It allows, using the disk space whenever it needs to.

Optional. Enables writing to temporary files. When set to true, aggregation operations can write data to the _tmp subdirectory in the dbPath directory. See Perform Large Sort Operation with External Sort for an example.

as in:

db.audit.aggregate([
{$match: { "date": { $gte : ISODate("2015-01-22T00:00:00.000Z"),
                    $lt : ISODate("2015-01-23T00:00:00.000Z")
                    }
                }
            },

{ $unwind : "$data.items" }] ,  // note, the pipeline ends here
{
  allowDiskUse : true
});

edited Jan 23, 2015 at 18:16

answered Jan 23, 2015 at 18:02

BatScream

19.7k4 gold badges54 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

NewMongoDBUser Over a year ago

Do you have any other suggestions? Unfortunately with allowDiskUse : true I am getting the same original error Error("Printing Stack Trace")@:0 ()@src/mongo/shell/utils.js:37 ([object Array],[object Object])@src/mongo/shell/collection.js:866 @(shell):10 uncaught exception: aggregate failed: { "errmsg" : "exception: aggregation result exceeds maximum document size (16MB)", "code" : 16389, "ok" : 0, "$gleStats" : { "lastOpTime" : Timestamp(1422046336000, 21), "electionId" : ObjectId("542c2900de1d817b13c8d339") } }

BatScream Over a year ago

@NewMongoDBUser Why do you want to unwind the array items? Can you explain your use case a bit. We can think of other possible solutions which avoid the $unwind operation.

NewMongoDBUser Over a year ago

I need to load data into mySql and each array element will be in the separate row in the rdbms.

BatScream Over a year ago

You could better write the items split on the client side, since $unwind takes a bit of toll when the number and size of the documents is quiet huge.

NewMongoDBUser Over a year ago

i am trying to use PDI kettle for it and seems stuck with this problem

Collectives™ on Stack Overflow

MongoDB aggregation query

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related