Getting index.html content while trying to scrape a react website

Question

when i try to scrape a reactjs website using nodejs i am getting the content of index.html file only not the tags that were used in the website. Here is what i have tried -

    const request = require("request");
    const cheerio = require("cheerio");

    const URL = "https://pydata-jal.netlify.com/";

    request(URL, (err, res, body) => {
      if (!err && res.statusCode == 200) {
        const $ = cheerio.load(body);
        console.log($.html());
      }
    });

What should i do to get the whole of tags that were used in react website.

And do tell i can scrape the hackernoon website ? (for just example) if its legal?

Mosè Raguzzini · Accepted Answer · 2019-08-01 14:47:35Z

1

Cheerio parses only already rendered HTML (eg: static HTML) In order to get the React render you should rely on headless browsers controlled with tools like Puppeteer

answered Aug 1, 2019 at 14:47

Mosè Raguzzini

15.9k1 gold badge34 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rahul Syal Over a year ago

means we can never scrap a react website using cheerio??

Mosè Raguzzini Over a year ago

Yes, cheerios parses the html content and let you access to nodes in jQuery fashion. React needs a browser core in order to render correctly (javascript has to be executed and DOM manipulated and reconciled with Virtual DOM)

Collectives™ on Stack Overflow

Getting index.html content while trying to scrape a react website

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related