1

I have a string variable which I would like to extract the title value in id="resultcount" element. The output should be 2.

var str = '<table cellpadding=0 cellspacing=0 width="99%" id="addrResults"><tr></tr></table><span id="resultcount" title="2" style="display:none;">2</span><span style="font-size: 10pt">2 matching results. Please select your address to proceed, or refine your search.</span>';

I tried the following regex but it is not working:

/id=\"resultcount\" title=['\"][^'\"](+['\"][^>]*)>/
1

4 Answers 4

3

Since var str = ... is Javascript syntax, I assume you need a Javascript solution. As Peter Corlett said, you can't parse HTML using regular expressions, but if you are using jQuery you can use it to take advantage of browser own parser without effort using this:

$('#resultcount', '<div>'+str+'</div>').attr('title')

It will return undefined if resultcount is not found or it has not a title attribute.

Sign up to request clarification or add additional context in comments.

Comments

1

To make sure it doesn't matter which attribute (id or title) comes first in a string, take entire html element with required id:

var tag = str.replace(/^.*(<[^<]+?id=\"resultcount\".+?\/.+?>).*$/, "$1")

Then find title from previous string:

var res = tag.replace(/^.*title=\"(\d+)\".*$/, "$1");
// res is 2

But, as people have previously mentioned it is unreliable to use RegEx for parsing html, something as trivial as different quote (single instead of double quote) or space in "wrong" place will brake it.

Comments

0

Please see this earlier response, entitled "You can't parse [X]HTML with regex":

RegEx match open tags except XHTML self-contained tags

1 Comment

I see...Is there an alternative way to parse the string without using regex? Thanks.
0

Well, since no one else is jumping in on this and I'm assuming you're just looking for a value and not trying to create a parser, I'll give you what works for me with PCRE. I'm not sure how to put it into the java format for you but I think you'll be able to do that.

span id="resultcount" title="(\d+)"

The part you're looking to get is the non-passive group $1 which is the '\d+' part. It will get one or more digits between the quote marks.

1 Comment

Thanks everyone for taking the time to answer my question and giving me tips.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.