String extraction with javascript

Question

I'm using jquery and have an entire html page stored in var page

var page = '<html>...<div id="start">......</div><!-- start -->....</html>';

How can I extract only the section that starts with <div id="start"> all the way to after the end tag </div> such that my output is

<div id="start">......</div><!-- start -->

You should definitely use regex for this.

Nick Craver
– Nick Craver

2010-11-10 21:55:57 +00:00
Commented Nov 10, 2010 at 21:55 — Nick Craver
– Nick Craver, Commented Nov 10, 2010 at 21:55

AndreKR · Accepted Answer · 2010-11-10 21:56:28Z

2

$(page).find('#start').html();

answered Nov 10, 2010 at 21:56

AndreKR

34k21 gold badges120 silver badges181 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Nick Craver Over a year ago

This would get the inner HTML, and not necessarily matching the original string.

AndreKR Over a year ago

Correct. I wondered whether to write this, too, but he wrote that he is using jQuery (wants to use?) and so this is the only way. If he ever needs the whole element (the outerHTML), he will find a way to wrap it. But what for?

gnarf Over a year ago

Be VERY careful throwing $(page) around like that... You'll want to save a "copy" of that if you plan on searching within it very often. var $page = $(page); Otherwise you're going to create/destroy multiple copies of the DOM representation... Also, if you want to work with it in jQuery, you won't need the .html() -- i.e. $page.find("#start").appendTo("#target");

Lee · Accepted Answer · 2010-11-10 22:21:10Z

2

if it's valid html, it would be easiest to just let the browser do it for you. Something like this would do the trick:

var page = '<html><head><title>foo</title><body><div id="stuff"><div id="start">blah<span>fff</span></div></div></body></head></html>';

var start_div = $('#start', page).parent();
alert( start_div.html() )

You can see this example in action at jsFiddle.

[edit] as @Nick pointed out above, this would probably not include the html comment at the end of the div. It also might not work in all browsers -- I don't know -- you should test it. Post back and let us know.

edited Nov 10, 2010 at 22:21

answered Nov 10, 2010 at 22:12

Lee

13.5k1 gold badge34 silver badges46 bronze badges

Comments

Tatu Ulmanen · Accepted Answer · 2010-11-10 21:57:34Z

1

This should do it:

var result = $(page).find('#start')[0].outerHTML;

answered Nov 10, 2010 at 21:57

Tatu Ulmanen

125k34 gold badges190 silver badges185 bronze badges

1 Comment

Nick Craver Over a year ago

This isn't available in all browsers - if it were, it wouldn't include the comment.

FatherStorm · Accepted Answer · 2010-11-10 21:58:10Z

1

regex. or the lazy way (which I don't recommend but is quick..) would be to create a hidden DIV, throw it in the div and do a selector for it

$('#myNewDiv').next('#start').html();

answered Nov 10, 2010 at 21:58

FatherStorm

7,1711 gold badge24 silver badges27 bronze badges

Comments

Ben Lee · Accepted Answer · 2010-11-10 22:04:09Z

1

var start = page.match(/(<div id="start">.*?<!-- start -->)/m)[1];

edited Nov 10, 2010 at 22:04

answered Nov 10, 2010 at 21:56

Ben Lee

53.5k15 gold badges129 silver badges146 bronze badges

6 Comments

Ender Over a year ago

This won't work if there is a newline between the opening and closing tags.

Ben Lee Over a year ago

@Ender: Of course, forgot to add the "m" for multi-line mode.

Ben Lee Over a year ago

In any case, chances are the OP did not want what he asked for. A regex will return exactly what he asked for, but if what he really wants is an html component rather than that exact string, he should be using jQuery html parsing like many others here have suggested.

Ender Over a year ago

Unfortunately, JS doesn't have that feature (or rather, it does, but not in the way you're thinking). Multi-line mode applies only to the start and end of string anchors. The /s flag (which is what allows the . to also match newlines in languages like Perl) isn't supported by javascript.

Ender Over a year ago

I had to look it up myself, to be sure :) I found this question that clears it up, if you're interested: stackoverflow.com/questions/1068280/…

|

Ender · Accepted Answer · 2010-11-10 22:12:05Z

1

An appropriate regular expression will get you what you are looking for. Try using a line like this:

var start = page.match(/(<div id="start">[\s\S]*?<\!-- start -->)/)[1];

This uses JavaScript's match method to return an array of matches from your page string, and puts the first parenthetized sub-match (in this case, your #start tag and the following comment), into start.

Here's a demo that shows this method working: http://jsfiddle.net/Ender/mphUj/

edited Nov 10, 2010 at 22:12

answered Nov 10, 2010 at 22:00

Ender

15.3k8 gold badges38 silver badges51 bronze badges

Collectives™ on Stack Overflow

String extraction with javascript

6 Answers 6

3 Comments

Comments

1 Comment

Comments

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

Comments

1 Comment

Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related