16

I am trying to get the correct encoding with request.

request.get({
    "uri":'http://www.bold.dk/tv/',
    "encoding": "text/html;charset='charset=utf-8'"
  },
  function(err, resp, body){    
    console.log(body);
  }
);

No matter what I do the encoding of the danish chars are not right.

Any thoughts?

7
  • 3
    You've mixed encoding with the content-type header -- e.g.: "encoding": "utf-8". But, the page is encoded in ISO-8859-1 rather than UTF-8. For that, see stackoverflow.com/questions/8915404/…. Commented Aug 20, 2012 at 16:48
  • @Amberlamps: Im using notepad++ Commented Aug 20, 2012 at 20:01
  • 1
    @hippie: Now this is a long shot, but sometimes I have the same issue with german letters. And everytime that happens it is because my Notepad++ saves my scripts as ANSI and not UTF-8. If it in ANSI, try switching it to UTF-8. It is an option in your Notepad++ under Coding (dunno what the english term there is, because I am using the german version) Commented Aug 20, 2012 at 20:07
  • Mixed it up, tried alot of things. I tried both and nothing is working. Commented Aug 20, 2012 at 20:18
  • 1
    @Amberlamps : Your are right about the Notepad++ has issues. Just ran it in cmd. Works fine. Thanks all for helping out Commented Aug 20, 2012 at 20:49

3 Answers 3

31

You can use iconv (lite) to convert this. You also need to tell request not to actively set the encoding to the default of UTF-8 by setting the encoding property to null. Therefore you should do:

var iconv = require('iconv-lite');
request.get({
    uri:'http://www.bold.dk/tv/',
    encoding: null
  },
  function(err, resp, body){    
    var bodyWithCorrectEncoding = iconv.decode(body, 'iso-8859-1');
    console.log(bodyWithCorrectEncoding);
  }
);
Sign up to request clarification or add additional context in comments.

Comments

3

Maybe your trouble is in 'Accept-Encoding' header. Let's say you have Headers like 'Accept-Encoding': 'gzip,deflate'

If it's so, you have 2 ways to fixing this:

  1. Remove this Header
  2. Use the following code to unzip the data:

    const req = request(options, res => {
        let buffers = []
        let bufferLength = 0
        let strings = []
    
        const getData = chunk => {
            if (!Buffer.isBuffer(chunk)) {
                strings.push(chunk)
            } else if (chunk.length) {
                bufferLength += chunk.length
                buffers.push(chunk)
            }
        }
    
        const endData = () => {
            let response = {code: 200, body: ''}
            if (bufferLength) {
                response.body = Buffer.concat(buffers, bufferLength)
                if (options.encoding !== null) {
                    response.body = response.body.toString(options.encoding)
                }
                buffers = []
                bufferLength = 0
            } else if (strings.length) {
                if (options.encoding === 'utf8' && strings[0].length > 0 && strings[0][0] === '\uFEFF') {
                    strings[0] = strings[0].substring(1)
                }
                response.body = strings.join('')
            }
            console.log('response', response)
        };
    
        switch (res.headers['content-encoding']) {
            // or, just use zlib.createUnzip() to handle both cases
            case 'gzip':
                res.pipe(zlib.createGunzip())
                    .on('data', getData)
                    .on('end', endData)
                break;
            case 'deflate':
                res.pipe(zlib.createInflate())
                    .on('data', getData)
                    .on('end', endData)
                break;
            default:
                res.pipe(zlib.createInflate())
                    .on('data', getData)
                    .on('end', endData)
                break;
        }
    });
    

1 Comment

Removing 'accept-encoding' from the request header fixed the issue for me. Thanks.
0

I have the same problem, with request v2.88.0.

Refer to woolfi makkinan's answer, I got a simple way to solve the problem.

request.get({
    "uri": 'http://www.bold.dk/tv/',
    "encoding": "text/html;charset='charset=utf-8'",
    "gzip": true // notice this config
  },
  function(err, resp, body){    
    console.log(body);
  }
);

Add gzip: true to request options, request will deal with gzip, and then blob can convert to string correctly. ​

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.