46

I'm trying to read the content from a URL with Node.js but all I seem to get are a bunch of bytes. I'm obviously doing something wrong but I'm not sure what. This is the code I currently have:

var http = require('http');

var client = http.createClient(80, "google.com");
request = client.request();
request.on('response', function( res ) {
    res.on('data', function( data ) {
        console.log( data );
    } );
} );
request.end();

Any insight would be greatly appreciated.

4 Answers 4

59

try using the on error event of the client to find the issue.

var http = require('http');

var options = {
    host: 'google.com',
    path: '/'
}
var request = http.request(options, function (res) {
    var data = '';
    res.on('data', function (chunk) {
        data += chunk;
    });
    res.on('end', function () {
        console.log(data);

    });
});
request.on('error', function (e) {
    console.log(e.message);
});
request.end();
Sign up to request clarification or add additional context in comments.

3 Comments

This example works for most links. However I've found that this URL, au.yahoo.com gives back a buffer of data. Even when you convert to string, from buffer with different types of encoding its non readable. Any ideas?
@NickTaras see if my answer suit your need ;-p
Looks like a good solution. I this case Yahoo was serving the website with G-Zip compression. Another process was needed to unzip webpages before scrapping. Hope this helps anyone who has the same issue. Ttimasdf, I’ll run you code example and reply more specifically soon.
27

HTTP and HTTPS:

const getScript = (url) => {
    return new Promise((resolve, reject) => {
        const http      = require('http'),
              https     = require('https');

        let client = http;

        if (url.toString().indexOf("https") === 0) {
            client = https;
        }

        client.get(url, (resp) => {
            let data = '';

            // A chunk of data has been recieved.
            resp.on('data', (chunk) => {
                data += chunk;
            });

            // The whole response has been received. Print out the result.
            resp.on('end', () => {
                resolve(data);
            });

        }).on("error", (err) => {
            reject(err);
        });
    });
};

(async (url) => {
    console.log(await getScript(url));
})('https://sidanmor.com/');

Comments

9

the data object is a buffer of bytes. Simply call .toString() to get human-readable code:

console.log( data.toString() );

reference: Node.js buffers

1 Comment

Another option is to console.log(JSON.stringify(data)); Otherwise, I've found npm install eyes to be a useful tool for its inspector().
9

A slightly modified version of @sidanmor 's code. The main point is, not every webpage is purely ASCII, user should be able to handle the decoding manually (even encode into base64)

function httpGet(url) {
  return new Promise((resolve, reject) => {
    const http = require('http'),
      https = require('https');

    let client = http;

    if (url.toString().indexOf("https") === 0) {
      client = https;
    }

    client.get(url, (resp) => {
      let chunks = [];

      // A chunk of data has been recieved.
      resp.on('data', (chunk) => {
        chunks.push(chunk);
      });

      // The whole response has been received. Print out the result.
      resp.on('end', () => {
        resolve(Buffer.concat(chunks));
      });

    }).on("error", (err) => {
      reject(err);
    });
  });
}

(async(url) => {
  var buf = await httpGet(url);
  console.log(buf.toString('utf-8'));
})('https://httpbin.org/headers');

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.