0

I'm trying to match a regex (containing 1 variable) against a page of HTML code stored as a string.

The HTML string is an array, each element containing something as shown below. (I have split on a certain tag). Each element of the array contains some data of a House (name, amount of square meters, etc). Fictional of course. The point is that I need to match only 1 of these houses by matching the text between the first TD tags, and the part that I need is the VALUE (digits) in the last INPUT tag of the form.

<TR BGCOLOR=#D4C0A1>
 <TD WIDTH=40%><NOBR>Luminous&#160;Arc&#160;2</NOBR></TD>
 <TD WIDTH=10%><NOBR>154&#160;sqm</NOBR></TD>
 <TD WIDTH=10%><NOBR>6460&#160;gold</NOBR></TD>
 <TD WIDTH=40%><NOBR>rented</NOBR></TD>
 <TD><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>
 <FORM ACTION= METHOD=post><TR><TD>
  <INPUT TYPE=hidden NAME=world VALUE=Olympa>
  <INPUT TYPE=hidden NAME=town VALUE="Yalahar">
  <INPUT TYPE=hidden NAME=state VALUE=>
  <INPUT TYPE=hidden NAME=type VALUE=houses>
  <INPUT TYPE=hidden NAME=order VALUE=>
  <INPUT TYPE=hidden NAME=houseid VALUE=37010>
  <INPUT TYPE=image NAME="View" ALT="View" SRC="" BORDER=0 WIDTH=120 HEIGHT=18>
</TD></TR></FORM></TABLE></TD></TR>

I constructed the following RegEx:

var regex = new RegExp(house + "[\\s\\S]+name=houseid value=([0-9]+)>", "i");

where house is the name of the house (in this example, Luminous&#160;Arc&#160;2) and the part I need would be the houseid 37010.

I figured this Regex should work quite fine and give me the hit that I need, however houses[i].match(regex) returns null every time. I get no match in the string.

I have tried several approaches so far, including attempting to convert the string to a DOM Object to split up on TR tags (the conversion failed). I feel that I am close, but I am stuck.

Does anyone see why my regex might fail to work?

Kenneth

3
  • Don't parse HTML with regexes HTML's too complex for that. Commented Jan 23, 2013 at 11:55
  • So how exactly would you suggest I parse it? This source code is the only string I have, so it figures I somehow need to manipulate it. Commented Jan 23, 2013 at 12:12
  • I'll write up a possibility. Commented Jan 23, 2013 at 12:17

2 Answers 2

2

You could add the string to your html (in a display:none div or something like that), and then just access the DOM like you would anywhere.

For example:

<div id="stringContainer"></div>
var searchstring = "Luminous&#160;Arc&#160;2";
searchstring = searchstring.replace(/&#160;/g, '&nbsp;') // Convert &#160; to &nbsp;

var c = document.getElementById("stringContainer");
c.innerHTML = '<table>'+houses+'</table>';
var h = c.getElementsByTagName('tr');

for(var i = 0, l = h.length; i < l; i++){ // Loop through the found elements
    var name = h[i].firstChild.nextSibling.getElementsByTagName('nobr')[0]; // Get the house's name.
    if(name && name.innerHTML == searchstring){ // If the name matches the search string. (innerHTML returns &nbsp; instead of &#160;. hence the replace earlier.)
        console.log(h[i].getElementsByTagName('input')[5].value) // log the value.
    }
}

Working example

Assuming the variable houses is:

var houses = '<TR BGCOLOR=#D4C0A1>\n\
<TD WIDTH=40%><NOBR>Luminous&#160;Arc&#160;2</NOBR></TD>\n\
<TD WIDTH=10%><NOBR>154&#160;sqm</NOBR></TD>\n\
<TD WIDTH=10%><NOBR>6460&#160;gold</NOBR></TD>\n\
<TD WIDTH=40%><NOBR>rented</NOBR></TD>\n\
<TD>\n\
    <TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>\n\
        <FORM ACTION= METHOD=post>\n\
            <TR>\n\
            <TD>\n\
            <INPUT TYPE=hidden NAME=world VALUE=Olympa>\n\
            <INPUT TYPE=hidden NAME=town VALUE="Yalahar">\n\
            <INPUT TYPE=hidden NAME=state VALUE=>\n\
            <INPUT TYPE=hidden NAME=type VALUE=houses>\n\
            <INPUT TYPE=hidden NAME=order VALUE=>\n\
            <INPUT TYPE=hidden NAME=houseid VALUE=37010>\n\
            <INPUT TYPE=image NAME="View" ALT="View" SRC="" BORDER=0 WIDTH=120 HEIGHT=18>\n\
            </TD>\n\
            </TR>\n\
        </FORM>\n\
    </TABLE>\n\
</TD>\n\
</TR>\n\
<TR BGCOLOR=#D4C0A1>\n\
<TD WIDTH=40%><NOBR>Dark&#160;Arc&#160;2</NOBR></TD>\n\
<TD WIDTH=10%><NOBR>154&#160;sqm</NOBR></TD>\n\
<TD WIDTH=10%><NOBR>6460&#160;gold</NOBR></TD>\n\
<TD WIDTH=40%><NOBR>rented</NOBR></TD>\n\
<TD>\n\
    <TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>\n\
        <FORM ACTION= METHOD=post>\n\
            <TR>\n\
            <TD>\n\
            <INPUT TYPE=hidden NAME=world VALUE=Olympa>\n\
            <INPUT TYPE=hidden NAME=town VALUE="Yalahar">\n\
            <INPUT TYPE=hidden NAME=state VALUE=>\n\
            <INPUT TYPE=hidden NAME=type VALUE=houses>\n\
            <INPUT TYPE=hidden NAME=order VALUE=>\n\
            <INPUT TYPE=hidden NAME=houseid VALUE=37010>\n\
            <INPUT TYPE=image NAME="View" ALT="View" SRC="" BORDER=0 WIDTH=120 HEIGHT=18>\n\
            </TD>\n\
            </TR>\n\
        </FORM>\n\
    </TABLE>\n\
</TD>\n\
</TR>';
Sign up to request clarification or add additional context in comments.

Comments

1

I tried your regex with Cerbrus's houses variable and it works fine.
(I added the lazy quantifier ? to [\\s\\S]+, but it works fine without it as well.)

var house = "Luminous&#160;Arc&#160;2";
var regex = new RegExp( house + "[\\s\\S]+?name=houseid value=([0-9]+)>", "i" );

houses.match( regex )[1];    // "37010"

Presumably then, your house variable has the wrong value or houses[i] is not accessing the right string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.