1

I have this code snippet:

var re = new RegExp("<a href=\"(news[^?|\"]+).*?>([^<]+)</a>", "g");
var match;
while (match = re.exec(body)){
    var href = match[1];
    var title = match[2];
    console.log(href);

    db.news.findOne({ title: title }, function(err, result){
        if (err) {
            console.log(err);
        } else {
            console.log(href);
            // more codes here
        }
    });
}

Here is the sample output:

news/2015/02/20/347332.html
news/2015/02/19/347307.html
news/2015/02/19/347176.html
news/2015/02/19/347176.html
news/2015/02/19/347176.html
news/2015/02/19/347176.html

So, I have three sets of data to be passed to findOne function. However, only the last one got passed in three times. How to workaround?

UPDATE based on jfriend00 and Neta Meta, these are the two ways to make it work:

var re = new RegExp("<a href=\"(news[^?|\"]+).*?>([^<]+)</a>", "g");
var cnt = 0;
function next(){
    var match = re.exec(body);
    if (match) {
        var href = match[1];
        var title = match[2];
        db.news.findOne({ title: title }, function(err, result){
            if (err) {
                console.log(err);
            } else {
                console.log(href);
                // more codes here
            }
        });
    }
}
next();

Or

var asyncFunction = function(db, href, title){
    db.news.findOne({ title: title }, function(err, result){
        if (err) {
            console.log(err);
        } else {
            console.log(href);
            // more codes here
        }
    });
}

var re = new RegExp("<a href=\"(news[^?|\"]+).*?>([^<]+)</a>", "g");
var match;
var cnt = 0;
while (match = re.exec(body)) {
    asyncFunction(db, match[1], match[2]);
}
4
  • you can use github.com/cowboy/javascript-sync-async-foreach library Commented Feb 20, 2015 at 1:36
  • @Miliu the word "or" does not apply, those are 2 different use cases. the answer by jfriend00, will chain the queries, where as mine will shot all them about the same time. jfriends answer will run query wait until its finishes and run another - its a easier way to keep the order of queries made - but will run slower, mine will not keep the order but will generally be faster - although you can add the index to "asyncFunction " and keep the order also. Commented Feb 20, 2015 at 4:07
  • @Neta Meta I don't understand why you say the word "or" does not apply. I can use one of the two solutions. Either one works. Commented Feb 20, 2015 at 4:31
  • you can use both, but its a completely different use case one will do one thing and the other will do another. it depends what you need, "Or" will imply 2 options that would work for the same use case Commented Feb 20, 2015 at 4:33

2 Answers 2

2

The reason you don't get the output you expect is because you're sharing the href and title variables for all your database calls. Thus, those aren't kept track of separately for each async database operation.

If you're OK with all your async functions executing at once and the data can be processed in any order, then you just need to create a closure so capture your local variables separately for each invocation of the loop:

var re = new RegExp("<a href=\"(news[^?|\"]+).*?>([^<]+)</a>", "g");
var match, cntr = 0;
while (match = re.exec(body)){
    (function(href, title, index) {
        console.log(href);
        db.news.findOne({ title: title }, function(err, result){
            if (err) {
                console.log(err);
            } else {
                console.log(href);
                // more codes here
            }
        });
    })(match[1], match[2], cntr++);
}

If you want to issue the requests serially (only one at a time), then you can't really use the while loop to control things because it's going to launch them all at once. I tend to use this type of design pattern with a next() local function instead of the while loop for serial operation:

function someFunction() {

    var re = new RegExp("<a href=\"(news[^?|\"]+).*?>([^<]+)</a>", "g");

    function next() {
            var match = re.exec(body);
            if (match) {
                var href = match[1];
                var title = match[2];

                db.news.findOne({ title: title }, function(err, result){
                    if (err) {
                        console.log(err);
                    } else {
                        console.log(href);
                        // more codes here

                        // launch the next iteration
                        next();
                    }
                });
            }
    }

    // run the first iteration
    next();
}

Using promises, you could promisify() the db.news.findOne() function so it returns a promise, collect all the matches into an array and then use .reduce() to sequence all the database calls with the promise's .then() method providing the sequencing.

Sign up to request clarification or add additional context in comments.

5 Comments

Yours is clean, and nicer. one comment however, even if he do care about the order, he can also pass the loop index to the function and assign the value to an array, like q.all does (sort of)
@NetaMeta - good point. I added a counter/index variable to the closure.
i think his best bet is promises instead of those solutions.
@NetaMeta - it depends upon the full requirements which aren't really spelled out here. I wouldn't necessarily go get a promise library or manually write a promise wrapper for db.news.findOne() if my first solution here worked just fine. But, if the requirements are more complicated, then you'd be using other promise features and it probably would be worth it.
Thanks very much for your quick response. I tried both suggestions. The second one works (note that the curly brace after re.exec should not be there), but I can't make the first one work. The internal function gets called once only, and strangely, the second match is passed. I'm confused.
1

The reason you only get the last href is because while iterates and call fineOne which is an asyc operation. while wont wait till the findOne finishes it just continue running by the time the findOne finishes while got to the end of the loop and that's why you're getting the same href.

there are several ways you could do that, 1 promises(prefered in my opinion) - you will have to read about promisses to learn more. however checkout: https://github.com/petkaantonov/bluebird http://www.html5rocks.com/en/tutorials/es6/promises/ and http://promise-nuggets.github.io/articles/03-power-of-then-sync-processing.html

Wrapping your async function in another function and binding whatever you want to it ( not a good option but possible)

// wrapping your async function.
var asyncFunction = function(title,href, successCb, failCb){
    db.news.findOne({ title: title }, function(err, result){
        if (err) {
            failCb();
        } else {
            successCb()
        }
    });
};
var re = new RegExp("<a href=\"(news[^?|\"]+).*?>([^<]+)</a>", "g");
var match;
while (match = re.exec(body)){
    var href = match[1];
    var title = match[2];

    asyncFunction.call(this,title, href, function success(){}, function fail(){} );


}

2 Comments

Thanks for your response. With some modification, I could make it work. I'm not sure what is your intention of passing two callback functions. If I put my code in success callback, it doesn't work. However, if I remove those two callbacks, and use href and title inside asyncFunction directly, it works. Also, I need to change this to db in asyncFunction.call.
The 2 cbs are just incase you needed to do something with the query result they are not needed However, look at @jfriend00 answer its more complete then mine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.