This is in a nodejs/express service using Mongo.
I have a collection of objects that each have 17 fields. I am trying to produce a set of distinct values for one particular field and their counts. There is probably a better way of doing this than I am doing, and I'm interested in hearing of it, but for now my specific problem is around the odd behavior I see with the way I am doing it.
var collection = db.collection('docs');
var counts = {};
var docs = collection.find({title: {$ne: null}}, {title:1});
while (docs.hasNext()) {
var doc = docs.next();
console.log('Processing ' + JSON.stringify(doc));
counts[doc.title] = (counts[doc.title]||0) + 1;
}
There should be about 6000 documents, but what I see is:
Processing {}
Processing {}
...
close to 20,000 times, after which there is a pause for a few seconds, and then finally a fatal error with out of memory.
If I run that same find in Robomongo, I get the results I expect, namely about 6000 documents with non-null titles.
Can anyone suggest what the problem is?
Note - I'm not looking for alternative working ways to achieve the same effect - I have those - what I'm looking for is an explanation for what is going wrong when trying to do things this way, because it AFAICT it should work, and I'd like to close the gap in my understanding. For example, using toArray things work:
var result = collection.find({title: {$ne: null}}, {title: 1});
results.toArray(function(e, docs) {
console.log('Got ' + docs.length + ' results');
for (var i = docs.length - 1; i>=0; i--) {
var doc = docs[i];
counts[doc.title] = (counts[doc.title]||0) + 1;
}
I have also pasted and run this almost identical code with no problem in the Mongo shell:
var collection = db.getCollection('docs');
var counts = {};
var docs = collection.find({title: {$ne: null}}, {title:1});
while (docs.hasNext()) {
var doc = docs.next();
printjson(doc);
counts[doc.title] = (counts[doc.title]||0) + 1;
}
printjson(counts);
That behaves as expected, and the only difference between that and the code running under node is it uses db.getCollection() versus db.collection(), and printjson() vs console.log().
So this seems to be some weird issue with running in nodejs specifically.
titlefield as index? try this and see if it works.db.collection.ensureIndex('title':1)ordb.collection.createIndex('title':1)