Parsing HTML using JavaScript

Question

I'm working a page that needs to fetch info from some other pages and then display parts of that information/data on the current page.

I have the HTML source code that I need to parse in a string. I'm looking for a library that can help me do this easily. (I just need to extract specific tags and the text they contain) The HTML is well formed (All closing/ending tags present).

I've looked at some options but they are all being extremely difficult to work with for various reasons.

I've tried the following solutions:

jkl-parsexml library (The library js file itself throws up HTTPError 101)
jQuery.parseXML Utility (Didn't find much documentation/many examples to figure out what to do)
XPATH (The Execute statement is not working but the JS Error Console shows no errors)

And so I'm looking for a more user friendly library or anything(tutorials/books/references/documentation) that can let me use the aforementioned tools better, more easily and efficiently.

An Ideal solution would be something like BeautifulSoup available in Python.

You could add it to the DOM, hide it, then access your elements with plain js or jQuery. That's actually letting the browser parse it for you, and using js to traverse the DOM. — bfavaretto
– bfavaretto, Commented Sep 11, 2012 at 22:53
The HTML I have is heavily nested(10-12 levels deep) and lacks class,name and id attributes; i.e the getELementById and similar functions are rendered effectively useless. So recovering the required data would be a real bother that way. — ffledgling
– ffledgling, Commented Sep 11, 2012 at 22:56
Hm. Take a look at jquery selectors. It should be powerful enough. Something like this "div p span" will find all spans located inside div and than inside p. "div>p>span" will do the same, but now p must be a direct child of div and span - direct child of such p. And there are a lot of other helpful selectors/functions in jquery — Viktor S.
– Viktor S., Commented Sep 11, 2012 at 23:00
@bfavaretto I can't say for sure that a custom parser will make the job easier, but this was the first approach I tried and it was extremely time consuming. I was hoping that the parser would give me nested dictionaries which I could loop through more easily. — ffledgling
– ffledgling, Commented Sep 11, 2012 at 23:03

Elliot Bonneville · Accepted Answer · 2012-09-11 22:56:08Z

4

Using jQuery, it would be as simple as $(HTMLstring); to create a jQuery object with the HTML data from the string inside it (this DOM would be disconnected from your document). From there it's very easy to do whatever you want with it--and traversing the loaded data is, of course, a cinch with jQuery.

answered Sep 11, 2012 at 22:56

Elliot Bonneville

53.6k23 gold badges101 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

ffledgling Over a year ago

I'm not sure if this is a problem with my code or the HTML itself but I get "Error: Invalid XML" when I try this. Here is the code I used ` htmlDoc = $.parseXML(pagetext);$html = $( htmldoc );$html.find("body");`

Elliot Bonneville Over a year ago

@Ayos: I would guess it's because you're trying to pass something into .parseXML that is invalid XML. What's the contents of pagetext?

ffledgling Over a year ago

The page contains HTML with CSS in the head and Javascript within the <script> tags. It's basically the entire source code of a website obtained via XHR's responseText.

Elliot Bonneville Over a year ago

Try var $html = $(pagetext) directly, then.

Viktor S. · Accepted Answer · 2012-09-11 22:56:52Z

0

You can do something like this:

$("string with html here").find("jquery selector")

$("string with html here") this will create a document fragment and put an html into it (basically, it will parse your HTML). And find will search for elements in that document fragment (and only inside it). At the same time it will not put it in page DOM

answered Sep 11, 2012 at 22:56

Viktor S.

12.8k1 gold badge30 silver badges56 bronze badges

Collectives™ on Stack Overflow

Parsing HTML using JavaScript

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related