1

I'm trying to write an XPath statement to fetch the contents of each row in a table, but only when the 2nd column of each row is not set to "TBA". The page I am working off this page. I am new to using XPath.

I've come up with the following statement, which I've managed to test successfully (or appears successful anyway) with an online XPath tester, but have been unable to figure out how to apply it in node.js:

//*[@id="body_column_left"]/div[4]/table/tbody/tr/[not(contains(./td[2], 'TBA'))]

This is my attempt below, I've tried variations but I can't get it to even validate as a valid XPath statement and as a result I've been lost in not very helpful stack traces:

var fs = require('fs');
var xpath = require('xpath');
var parse5 = require('parse5');
var xmlser = require('xmlserializer');
var dom = require('xmldom').DOMParser;
var request = require('request');

var getHTML = function (url, callback) {
    request(url, function (error, response, body) {
        if (!error && response.statusCode == 200) {
            return callback(body) // return the HTML
        }
    })
}

getHTML("http://au.cybergamer.com/pc/csgo/ladder/scheduled/", function (html) {
    var parser = new parse5.Parser();
    var document = parser.parse(html.toString());
    var xhtml = xmlser.serializeToString(document);
    var doc = new dom().parseFromString(xhtml);
    var select = xpath.useNamespaces({"x": "http://www.w3.org/1999/xhtml"});    
    var nodes = select("//x:*[@id=\"body_column_left\"]/div[4]/table/tbody/tr/[not(contains(./td[2], 'TBA'))]", doc);
    console.log(nodes);    
});

Any help would be appreciated!

3
  • 1
    use cheerio , would be a lot easier. Commented Jul 14, 2015 at 1:48
  • Thanks for your response @hassansin, I will take a look into using cheerio. Commented Jul 14, 2015 at 6:20
  • I ended up solving this with Cheerio, thanks @hassansin :) Commented Jul 14, 2015 at 15:10

2 Answers 2

2

I ended up solving this issue using cheerioinstead of xpath:

See below:

    var $ = cheerio.load(html);
    $('.s_grad br').replaceWith("\n");
    $('.s_grad thead').remove();
    $('.s_grad tr').each(function(i, elem) {
        rows[i] = $(this).text();
        rows[i] = rows[i].replace(/^\s*[\r\n]/gm, ""); // remove empty newlines
        matches.push(new match($(this).find('a').attr('href').substring(7).slice(0, -1))) // create matches
    });
Sign up to request clarification or add additional context in comments.

Comments

-1

How about using this xpath-html, I loved its simplicity.

const xpath = require("xpath-html");

const nodes = xpath
  .fromPageSource(html)
  .findElements("//img[starts-with(@src, 'https://cloud.shopback.com')]");

2 Comments

While using this library Im getting this error TypeError: this.html.charCodeAt is not a function. Whats the cause of this.
Hi @AmnaArshad, can you create an issue on its GitHub repo?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.