Regular expression to extract text from a string in html format

Question

I am currently getting response error in html format. It is of type string.

"<!DOCTYPE html>\r\n
<html>
  <head>
    <title>Data already exists</title>
  </head>
</html>"

I wanted to retrieve the content inside the <title>, for above instance "Data already exists". Can anybody suggest a appropriate regular expression to capture that text.

Please any help is appreciated!

I really appreciate everyone's suggestion and thanks for taking time to share the knowledge. You guys are awesome. — inspiringmyself
– inspiringmyself, Commented Aug 29, 2012 at 14:07

João Silva · Accepted Answer · 2012-08-29 01:25:33Z

5

First, you can do it without regex, by creating a dummy element to inject the HTML:

var s = "your_html_string";
var dummy = document.createElement("div");
dummy.innerHTML = s;
var title = dummy.getElementsByTagName("title")[0].innerText;

_{But if you really insist on using regex:}

var s = "your_html_string";
var title = s.match(/<title>([^<]+)<\/title>/)[1];

Here's a DEMO illustrating both approaches.

edited Aug 29, 2012 at 1:25

answered Aug 29, 2012 at 1:19

João Silva

91.8k29 gold badges156 silver badges158 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

RobG Over a year ago

You don't need to use getElementsByTagName, there is a document.title property that is more convenient. Also, the title element can have attributes, so the regular expression needs to be more sophisticated (parsing HTML with a regular expression is generally a bad idea).

João Silva Over a year ago

@RobG: I absolutely agree that parsing HTML with a regex is generally a bad idea; however, OP explicitly said that it it was a response error that follows the above format. document.title will get the current document's title. Note that OP is no trying to parse the current document but a specific response message (probably from an ajax call).

nnnnnn Over a year ago

Hmm... One line of regex, or three lines of dummy element manipulation? One or three? I know which I'd choose. (I too agree that in a general sense parsing HTML with regex is not the way to go, but as you said João, for a specific case with a known format I think it is OK.)

RobG Over a year ago

Yes, all good. The OP could use the response text to create a new document, then just use document.title.

inspiringmyself Over a year ago

I really appreciate everyone's suggestion and thanks for taking time to share the knowledge. You guys are awesome.

elclanrs · Accepted Answer · 2012-09-15 10:08:59Z

2

The very basics of parsing html tags in regex is this. http://jsbin.com/oqivup/1/edit

var text = /<(title)>(.+)<\/\1>/.exec(html).pop();

But for more complicated stuff I would consider using a proper parser.

edited Sep 15, 2012 at 10:08

answered Aug 29, 2012 at 1:25

elclanrs

94.2k21 gold badges137 silver badges171 bronze badges

2 Comments

nnnnnn Over a year ago

Given the response is already a string can't you skip the jQuery line?

inspiringmyself Over a year ago

I really appreciate everyone's suggestion and thanks for taking time to share the knowledge. You guys are awesome.

Oriol · Accepted Answer · 2012-08-29 01:27:12Z

1

You could parse it using DOMParser():

var parser=new DOMParser(),
    doc=parser.parseFromString("<!DOCTYPE html><html><head><title>Data already exists</title></head></html>","text/html");

doc.title; /* "Data already exists" */

answered Aug 29, 2012 at 1:27

Oriol

291k71 gold badges459 silver badges535 bronze badges

10 Comments

João Silva Over a year ago

You probably need to use an ActiveXObject for IE < 9.

Dariush Jafari Over a year ago

and how we can use the doc variable with jquery?

Oriol Over a year ago

@DariushJafari Do you mean $(doc)?

Fabrício Matté Over a year ago

Chrome 23 Canary doesn't parse HTML with DOMParser though. If the HTML string is XML-valid, you can always use the application/xml parsing for cross-browser parsing.

Dariush Jafari Over a year ago

@Oriol how do you select some elements of doc? $('div.cc') selects the current document elements.

|

Collectives™ on Stack Overflow

Regular expression to extract text from a string in html format

3 Answers 3

5 Comments

2 Comments

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

2 Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related