How to optimize the computation speed in array.forEach in node.js

Question

I have a piece of code simply loops through two arrays and for each element of the first array, it finds the relevant element in the second array and changes only the first occurrence and delete the remaining ones.

 /**
     * The aggregation data structure:
     * "_id": {
     * "geometry": geometry,
     * "dups": [
     *    "5b3b25b4e54029249c459bfc", keep only the fisrt element in allDocs
     *    "5b3b25b4e54029249c459e65", delete it from allDocs
     *    "5b3b25b4e54029249c459d7d"   delete it from allDocs
     *   ],
     * "dupsProp": [  ], array of all properties of duplicatePoints
     * "count": 3
     */
var aggregationRes =[46,000 objects]
var allDocs =[345,000 objects]
aggregationRes.forEach(function (resElem, counter) {
        console.log(counter + "/" + aggregationRes.length)
        //Delete objects in allDocs based on dups array except the first one
        var foundIndex = allDocs.findIndex(x => x._id.toString() == resElem.dups[0]);
                //assign the mergedProperties
        allDocs[foundIndex].properties = resElem.dupsProp;
        //delete the remaining ids in Docs from dups array 
        resElem.dups.forEach(function (dupElem, index) {
            var tmpFoundIndex = allDocs.findIndex(x => x._id.toString() == resElem.dups[index + 1]);
            if (tmpFoundIndex !== -1) {
                allDocs.splice(tmpFoundIndex, 1)
            }
        })
    })

This script runs almost for 4 hours. As you see, the calculations are really simple but since the allDocs array is big, it takes quite a long time. It would be great if someone gives me a hint on how to decrease the computation time. Thanks in advance

Make allDocs indexed by id, so that you don't need to findIndex every time. And avoid doing many small splices on big arrays, rather mark them for deletion and then remove them all in one pass. — Bergi
– Bergi, Commented Aug 30, 2018 at 13:35
Instead of doing operations like x._id.toString() 46K * 345K * (dupecount + 1) times, iterate the allDocs array once creating a new property which is this x._id.toString() (345K operations only). Is x._id the entire object? or some numeric/string ID? — James
– James, Commented Aug 30, 2018 at 13:39
Would it be possible to simply resElem.dups = [resElem.dups[0]]; instead of the resElem.dups.forEach iteration? You want to end up with a new array that only contains the first entry in the old array, right? Seems simple enough — Lars Holdaas
– Lars Holdaas, Commented Aug 30, 2018 at 13:43
Your aggregation data structure example is wrong/misleading. It looks like the _id property is the entire inner object. Is this node.js, and are you using mongodb (the first search result for "javascript ObjectID")? Might want to add those tags to your question if so. I would imagine there is a much more efficient way to do this using your database. — James
– James, Commented Aug 30, 2018 at 14:27

juvian · Accepted Answer · 2018-08-31 15:32:45Z

3

Taking the ideas from Bergi, we index the documents by id to avoid having to find the indexes which is expensive:

var allDocs =[345,000 objects]
var aggregationRes =[46,000 objects]
var allDocsIndexed = {};

allDocs.forEach(function(doc){
    allDocsIndexed[doc._id.toString()] = doc;
});

aggregationRes.forEach(function (resElem, counter) {
    allDocsIndexed[resElem.dups[0]].properties = resElem.dupsProp;
    for (var i = 1; i < resElem.dupsProp.length; i++) {
        delete allDocsIndexed[resElem.dupsProp[i]];
    }
});

var allUndeletedDocs = allDocs.filter(doc => allDocsIndexed.hasOwnProperty(doc_id.toString()));

Note that for javascript, this is an efficient solution but with more details provided, a better one might exist using mongodb features.

answered Aug 31, 2018 at 15:32

juvian

16.1k2 gold badges42 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Maryam Koulaei Over a year ago

Hey... your solution was perfect. unbelievable. wooooooow, I can not believe that . it was supppppppppper fast. just in less than a minute, it did the whole calculation which i used to do it in 3 to 4 hours. I did not know what people usually says about indexing an array but Now with your example I understood it very well. thanks a lot. Just a quick question: I did not used the last line (filtering) because i already have all the result. why do you actually filtered them ?

juvian Over a year ago

@MaryamKoulaei we only deleted the keys from allDocsIndexed, we never deleted anything from allDocs. So instead of deleting, I just filter to get the ones that were not deleted

Collectives™ on Stack Overflow

How to optimize the computation speed in array.forEach in node.js

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related