1

I'm trying to scrape some code, to get a link, and some text from a paragraph. But for some reason my code dosen't work, i have tried alot, and every time, it just gives me undifined.

var request = require('request');
var cheerio = require('cheerio');

request('https://bitskins.com', function (error, response, html) {
  if (!error && response.statusCode == 200) {
    var $ = cheerio.load(html);
    $('p', '.chat-box-content').each(function(i, element){
        if($(this).attr('style') == 'height: 15px;'){
            console.log($(this));
        }
    });
  }
});

https://gyazo.com/b80465474a389657c44aeeb64888a006

I only wan it to return the second and the third line, so the link and the price, but do i have to do? I'm new and i lost.

1 Answer 1

2

The problem is that when you request the page, the chat box is a collapsed/hidden state, and all the <p> links (which are apparently placeholders) are empty. If open the chat box, some JavaScript on the page runs and populates the list.

Fortunately you don't need the scrape the screen at all. The page invokes an API to populate the list. You can just call the API yourself.

var request = require('request');

request.post('https://bitskins.com/api/v1/get_last_chat_messages', function (error, response, data) {
  if (!error && response.statusCode == 200) {
      var dataObject = JSON.parse(data);
      dataObject.data.messages.forEach(function (message) {
          // For some reason the message is JSON encoded as a string...
          var messageObject = JSON.parse(message);
          // The message object has "message" field.
          // Just use a regex to parse out the link and the price.
          var link = messageObject.message.match(/href='([^']+)/)[1];
          var price = messageObject.message.match(/\$(\d+\.\d+)/)[1];
          console.log(link + " " + price);
      });
  }
});

You probably will want to add better error-handling, convert the price into a number, etc.

Sign up to request clarification or add additional context in comments.

9 Comments

OMFG you are an herro, but i dont understand all the /\$(\d+\.\d+)/, how do i make it display the % off ?
Mark my answer correct and I'll tell you about the slashy stuff. :-)
Thanks. You know it's all about the points. An expression of the form /whatever/ is a regular expression, which describes a text pattern. You can pass it to the match function of a string to search the string for the pattern.
The regular expression \$(\d+\.\d+) means: find a '$' (there's a backslash in front to escape it, because normally '$' in a regular expression means "end of string," but we want a literal '$'), followed by at least one digit (\d+), followed by a period (again escaped with backslash because normally '.' means "any character"), followed by at least one digit. The digit-matching part is wrapped in parentheses which makes it into "capture group."
Assuming the regular expression matches the string, you'll get back an array-like object. Element 0 of the array is the entire matched portion of the string. Elements 1 through N are the capture groups that you defined. So in both cases I use capture group, so I get just the digits from the price, and just the href attribute from the link.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.