auto next page loop in node js for web scraping (crawler)?

Question

i am using crawler package in nodejs and i am able to get link of next page using jquery but i am stuck in automation. so i want to automate the process by running same script again and again so i can scrape entire website.

var Crawler = require('Crawler');
var c = new Crawler({
maxConnactions:10,
callback: function(err,rs,done){
    if (err) {throw err;}
    else{
        var $ = rs.$;
        var tag = $('span.next-button');
        tag.each(function(index,item){
            //targetting next page url
            var target = $(this).find('a').attr('href');
            //stored in db...
            db.push(target);
            console.log(target);
        })
    }
}

})

c.queue('https://www.reddit.com/r/fullmoviesongoogle/');

//so i want this link right here but its not possible beacouse of asysc code..
//and i also want to run this same function when i get this new link..

c.queue(db[0]);

so basically i want to build crawler sort of thing that scrape entire website by scraping next page link...

thanks in advance :)

Mike · Accepted Answer · 2018-04-19 09:58:14Z

1

You just enqueue urls directly in callback function:

`
function parser(err,rs,done){
    if (err) {throw err;}
    else{
        var $ = rs.$;
        var tag = $('span.next-button');
        tag.each(function(index,item){
            //targetting next page url
            var target = $(this).find('a').attr('href');
            c.queue(target);
            console.log(target);
        })
    }
}
`

Crawler uses the global callback function by default.

answered Apr 19, 2018 at 9:58

Mike

3491 silver badge9 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

auto next page loop in node js for web scraping (crawler)?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related