0

I'm tryign to build a very simple scraper function for nodeJS - just a function that I can pass a URL to, and it returns the scraped data as var data.

I'm completely new to Node.js and can't work out why the following isn't working:

var request = require('request');
var cheerio = require('cheerio');

function scrape(url) {
    console.log("Scraping: " + url);
    request(url, function(err, resp, body) {

            if (err) {
                throw err;
            }
            var html = cheerio.load(body);
            return html;
        });
}


var data = scrape('http://www.stackoverflow.com');

$ = data;
var logo = $('#hlogo a').text();
console.log(logo);

The above code should return "Stack Overflow" but obviously does not. When I run this in the console I get an error:

var logo = $('#hlogo a').text();
           ^
TypeError: Property '$' of object #<Object> is not a function

Any ideas why this isn't working for me?

1 Answer 1

2

Your data will be undefined, because scrape function does not return a value, additionaly it asynchronous.

You need change logic to something like this:

function scrape(url, oncomplete) {
    console.log("Scraping: " + url);
    request(url, function(err, resp, body) {

        if (err) {
            throw err;
        }
        var html = cheerio.load(body);
        oncomplete(html);
    });
}


scrape('http://www.stackoverflow.com', function(data) { /* do work here*/ });
Sign up to request clarification or add additional context in comments.

2 Comments

Can you elaborate on why asynchronous is bad here? I'd like to be able to throw multiple URLs at it and scrape them in unison (e.g. have 100 urls in an array, and use a for loop to iterate through them and call scrape() on each in parallel. Or am I not undertsanding asynchronisity?
Am I said async is bad? Since JS is one-threaded async requests is the only way to achieve "parallel" scraping. But your code don't take into account that it async. I showed you how logic should look.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.