0

I read html page encoded win1251. But I cant render it, because it shows me bad encoded symbols. with utf8 this code works fine. How can I read and show not utf8? Thanks

var charset = require('charset');
var iconv = require('iconv-lite');
var router = express.Router();


// accept POST request on the homepage
router.post('/', function (req, res) {

  request(req.body.url, function (error, response, body) {

    var result = [];
    if (error || response.statusCode != 200) {
        console.log(error);

    } else {
        console.log(charset(response.headers, body));
        var enc = charset(response.headers, body);
        if (enc != 'utf-8') {

            body = iconv.decode(body, 'win1251');
            console.log(body);
        }
        var $ = cheerio.load(body);

        //get title
        result.push("Title-> " + $("title").text());

2 Answers 2

2

If you set encoding: null in your request() options, body will be a Buffer instead of a UTF-8 string. This will allow you to correctly convert the encoding to UTF-8.

Example:

request({url: req.body.url, encoding: null}, function (error, response, body) {

If the encoding of the body ends up being UTF-8, you can simply just do:

body = body.toString('utf8');
Sign up to request clarification or add additional context in comments.

Comments

0

If someone gets this problem too. I did instead of (which is good too)

body = body.toString('utf8');

My code

var iconv = require('iconv-lite');
.......................

body = iconv.decode(body, 'win1251');

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.