I have 100 documents in my mongoDB, assuming each of them are possible duplicate with other document(s) in different conditions, such as firstName & lastName, email and mobile phone.
I am trying to mapReduce these 100 documents to have the key-value pairs, like grouping.
Everything works fine until I have the 101st duplicate records in the DB.
The output of the mapReduce result for the other documents which are duplicate with the 101st records are corrupted.
For example:
I am working on firstName & lastName now.
When the DB contains 100 documents, I can have the result containing
{
_id: {
firstName: "foo",
lastName: "bar,
},
value: {
count: 20
duplicate: [{
id: ObjectId("/*an object id*/"),
fullName: "foo bar",
DOB: ISODate("2000-01-01T00:00:00.000Z")
},{
id: ObjectId("/*another object id*/"),
fullName: "foo bar",
DOB: ISODate("2000-01-02T00:00:00.000Z")
},...]
},
}
It is what exactly I want, but...
when the DB contains more than 100 possible duplicated documents, the result became like this,
Let's say the 101st documents is
{
firstName: "foo",
lastName: "bar",
email: "[email protected]",
mobile: "019894793"
}
containing 101 documents:
{
_id: {
firstName: "foo",
lastName: "bar,
},
value: {
count: 21
duplicate: [{
id: undefined,
fullName: undefined,
DOB: undefined
},{
id: ObjectId("/*another object id*/"),
fullName: "foo bar",
DOB: ISODate("2000-01-02T00:00:00.000Z")
}]
},
}
containing 102 documents:
{
_id: {
firstName: "foo",
lastName: "bar,
},
value: {
count: 22
duplicate: [{
id: undefined,
fullName: undefined,
DOB: undefined
},{
id: undefined,
fullName: undefined,
DOB: undefined
}]
},
}
I found another topic on stackoverflow having the similar issue like me, but the answer does not work for me MapReduce results seem limited to 100?
Any ideas?
Edit:
Original source code:
var map = function () {
var value = {
count: 1,
userId: this._id
};
emit({lastName: this.lastName, firstName: this.firstName}, value);
};
var reduce = function (key, values) {
var reducedObj = {
count: 0,
userIds: []
};
values.forEach(function (value) {
reducedObj.count += value.count;
reducedObj.userIds.push(value.userId);
});
return reducedObj;
};
Source code now:
var map = function () {
var value = {
count: 1,
users: [this]
};
emit({lastName: this.lastName, firstName: this.firstName}, value);
};
var reduce = function (key, values) {
var reducedObj = {
count: 0,
users: []
};
values.forEach(function (value) {
reducedObj.count += value.count;
reducedObj.users = reducedObj.users.concat(values.users); // or using the forEach method
// value.users.forEach(function (user) {
// reducedObj.users.push(user);
// });
});
return reducedObj;
};
I don't understand why it would fail as I was also pushing a value (userId) to reducedObj.userIds.
Are there some problems about the value that I emitted in map function?