I am looking for html content extractor using xpath, I have seen various nodejs module for this like
jsdom, htmlparser2, xpath, cheerio
I found cheerio better for getting data using class, id, tags etc but I am not able to get data by specifying xpath , and by using xpath nodejs module I am able to get data using xpath for smaller html, for longer html it gives different type of error like
entity not found: @#[line:120,col:9], unclosed xml attribute @#[line:1,col:877]
Note: I have no permission to change html in any way
e.g. if my html is
<html>
<body>
<div>
<ul id="fruits">
<li class="apple">Apple</li>
<li class="orange">Orange</li>
<li class="pear">Pear</li>
</ul>
</div>
</body>
</html>
if I am using this and giving this xpath //*[@id="fruits"]/li[2] to find element using xpath nodejs module, I am not getting any error and got the result as Orange using xpath nodejs module, but if I am using html of this page http://www.infotaxi.org/india_taxi/ahmedabad_taxi.htm
(which is quite longer), and accessing the part of text using xpath
//*[@id="navlistmeniu"]/li[3]/a/b,
I am getting error
entity not found: @#[line:120,col:9]
Using Cheerio I am able to extract data using class, id, tags etc. and not with xpath
Please help????
$(#navlistmeniu > li).eq(3).find('a > b');