A date field in a document collection is represented by an integer in the yyyymmdd format: e.g. 20160407. Is there a way to convert it to a date as part of an aggregate pipeline so that it could be used to group documents by the corresponding week number?
1 Answer
Not possible within the aggregation pipeline. The general premise here would be to convert the numerical representation to its string equivalent, then do the update in a loop. For looping, you would need to manually iterate the cursor returned by the find() method by either using the forEach() method or the cursor method next() to access the documents.
Within the loop, convert the field first to a string format, then to a locale insensitive date format like "2016-04-07". Once you get the format then create a new ISODate object with that and update the field using the $set operator, as in the following example where the field is called created_at and currently holds the date in the specified numerical format YYYYMMDD:
var cursor = db.collection.find({"created_at": {"$exists": true, "$type": 1 }});
while (cursor.hasNext()) {
var doc = cursor.next(),
dateStr = doc.created_at.toString(),
dateStr.match(/(\d{4})(\d{2})(\d{2})/),
betterDateStr = match[2] + '-' + match[3] + '-' + match[1];
db.collection.update(
{"_id" : doc._id},
{"$set" : {"created_at" : new ISODate(betterDateStr)}}
)
};
For improved performance especially when dealing with large collections, take advantage of using the Bulk API for bulk updates as you will be sending the operations to the server in batches of say 1000 which gives you a better performance as you are not sending every request to the server, just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by changing the created_at fields to date fields:
var bulk = db.collection.initializeUnorderedBulkOp(),
counter = 0;
db.collection.find({"created_at": {"$exists": true, "$type": 1 }}).forEach(function (doc) {
var dateStr = doc.created_at.toString(),
dateStr.match(/(\d{4})(\d{2})(\d{2})/),
betterDateStr = match[2] + '-' + match[3] + '-' + match[1];
newDate = new ISODate(betterDateStr);
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "created_at": newDate}
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations and re-initialize every 1000 update statements
bulk = db.collection.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var bulkOps = [];
db.collection.find({"created_at": {"$exists": true, "$type": 1 }}).forEach(function (doc) {
var dateStr = doc.created_at.toString(),
dateStr.match(/(\d{4})(\d{2})(\d{2})/),
betterDateStr = match[2] + '-' + match[3] + '-' + match[1];
newDate = new ISODate(betterDateStr);
bulkOps.push(
{
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "created_at": newDate } }
}
}
);
})
db.collection.bulkWrite(bulkOps);