0

I'm trying to parse HTML in the browser. The browser receives 2 HTML files as strings, eg. HTML1 and HTML2.

I now need to parse these "documents" just as one would parse the current document. This is why I was wondering if it is possible to create custom documents based on these HTML strings (these strings are provided by the server or user).

So that for example the following would be valid: $(html1Document).$("#someDivID")...

If anything is unclear, please ask me to clarify more.

Thanks.

3 Answers 3

3
var $docFragment = $(htmlString);

$docFragment.find("a"); // all anchors in the HMTL string

Note that this ignores any document structure tags (<html>, <head> and <body>), but any contained tags will be available.

Sign up to request clarification or add additional context in comments.

5 Comments

Probably what I need. However, how would this handle script, html, head tags etc.? Would it not modify these?
@Define: "handle". (script tags will returned but not evaluated/executed)
Basically the side effects mentioned by Nikita Rybak. But I guess these won't apply here because I'm merely searching a string, not appending it to the document.
@Tom: Yes. You could, though, append a returned script tag to your document and have it executed, but it will execute in your document's context of course. As long as you don't append it, it is just there as a detached DOM node.
I wonder why I didn't think of it when solving similar problem? Should try it, thanks.
1

With jQuery you can do this:

$(your_document_string).someParsingMethod().another();

Comments

1

You can always append your html to some hidden div (though innerHTML or jQuery .html(..)). It won't be treated exactly as a new document, but still will be able to search its contents.

It has a few side-effects, though. For example, if your html defines any script tags, they'll be loaded. Also, browser may (and probably will) remove html, body and similar tags.

edit
If you specifically need title and similar tags, you may try iframe loading content from your server.

2 Comments

How would this work with non-body elements such as titles and javascript? Would this not cause problems/conflicts?
@Tom I updated my answer about it. In particular, script tags are annoying: if you scrapped content from another server, they'll point to yours now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.