1

I'm trying to finish this task on web scraping. On my web page, I take the URL and find what located is between its <body> tags. Then, I want to output the content which was found on my web page. I learned that I can use request module for this purpose.

The problem is, I cannot show the result in my page's HTML, because I could not save the result of request's work (in POST part)

Here is my code:

var request = require("request");
const express = require('express');
const app = express();
const session = require('express-session');
const path = require('path');
const bodyParser = require('body-parser');
const router = express.Router();
app.use(session({secret: 'shhhhhhh', saveUninitialized: true, resave: true}));
app.use(bodyParser.urlencoded({extended: true}));
var sess;

router.get('/', (req, res) => {
    res.sendFile(path.join(__dirname + '/index.html'));
    sess = req.session;
    if (app.get('done') === true) {
        console.log(app.get('info'));                    // prints "undefined"
        app.set('done', false);

        res.end(`
            <h1>Show other sites</h1>
            <form action="/" method="POST">
                <input type="text" name="site" id="site" placeholder="url"><br>
                <button type="submit">go</button>
                <a href="/">BACK</a>
            </form><br>
            <hr>
            <p>url: ${app.get('site')}</p>
            <hr>
            <div>
            ${app.get('info')}
            </div>
        `);
    }
    else
        res.sendFile(path.join(__dirname + '/index.html'));
})
router.post('/', (req, res) => {
    sess = req.session;
    sess.site = req.body.site;
    
    app.set('done', false);
    if (sess.site) {
        app.set('done', true);
        request({
            uri: `${sess.site}`,}, function(error, response, body) {
                app.set('info', body);       // Here I'm trying to save the scraped result
                app.set('site', sess.site);
            }
        );
    }
    res.redirect('/');
})
router.get('/clear', (req, res) => {
    req.session.destroy((err) => {
        if (err)
            return console.log(err);
        res.redirect('/');
    })
})
app.use('/', router);

app.listen(3000);
console.log("Running at port 3000");

Please help me find out what I'm doing wrong and how to save the result of Request module for later use.

1
  • Why are you using Express? Why not just a simple command line script? Commented Dec 25, 2020 at 3:37

1 Answer 1

1

I have to say that after following your logic it will be better to think again about it. keep in mind using a global variable is a bad practice!

been said that

you can solve the logic by following these minor changes

install node-fetch

npm i node-fetch

import it

const fetch = require('node-fetch');

then change the POST end point to

router.post('/', async (req, res) => {
    sess = req.session;
    sess.site = req.body.site;
    app.set('done', false);
    if (sess.site) {
        app.set('done', true);
        await fetch(sess.site)
            .then(resp => resp.text()).then(body => {
                console.log(body)
                app.set('info', body);       //Here I'm trying to save the scrapped result
                app.set('site', sess.site);
            })
    }
    res.redirect('/');

})
Sign up to request clarification or add additional context in comments.

2 Comments

That's so cool! Thanks a lot! I just have one more question. The provided solution works like a charm but it shows actual visual content of the provided website. What if I need to show only html content (with tags and stuff) instead?
change <div>${app.get('info')}</div> to <textarea>${app.get('info')}</textarea>

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.