1

I am trying to use DOM to get the days and times and also the rooms (im actually getting everything in my script but its getting these two im having trouble with) from the following batch of HTML:

                    </td><td class="call">
                    <span>12549<br/></span><a href="http://www.bkstr.com/webapp/wcs/stores/servlet/booklookServlet?bookstore_id-1=584&term_id-1=201190&crn-1=12549" target="_blank">View Book Info</a>
                    </td><td>
                    <span id="ctl10_gv_sectionTable_ctl03_lblDays">F:1000AM - 1125AM<br />T:230PM - 355PM</span>


                    </td><td class="room">
                    <span id="ctl10_gv_sectionTable_ctl03_lblRoom">KUPF106<br />KUPF106</span>
                    </td><td class="status"><span id="ctl10_gv_sectionTable_ctl03_lblStatus" class="red">Closed</span></td><td class="max">20</td><td class="now">49</td><td class="instructor">
                    <a href="https://directory.njit.edu/PersDetails.aspx?persid=SCHOENKA" target="_blank">Schoenebeck Kar</a>
                    </td><td class="credits">3.00</td>

        </tr><tr class="sectionRow">
            <td class="section">
                    101<br />

Here is what I have so far for finding days

    $tracker =0;
    // DAYS AND TIMES
    $number = 3;
    $digit = "0";
    while($tracker<$numSections){           
        $strNum = strval($number);
        $zero = strval($digit);
        $start = "ctl10_gv_sectionTable_ctl";
        $end = "_lblDays";
        $id = $start.$zero.$strNum.$end;
        //$days = $html->find('span.$id');
        $days=$html->getElementByTagName('span')->getElementById($id);
            echo "Days : ";
            echo $days[0] . '<br>';


        $tracker++;
        $number++;
        if($number >9){
            $digit = "1";
            $number=0;
        }
    }

as you can see from the HTML, the site im parsing has pretty unique ID's for some of its spans (ctl10_gv_sectionTable_ctl03_lblRoom). As I only posted 1 section's HTML block, what you don't see is that the code for the next class section is identical except for the "ctl03" part, which is what all the extra code I have takes care of, just so no one is thrown off by it.

I've tried a few different ways but can not seem to get the days (i.e. "1000AM - 1125AM") or the rooms (i.e. KUPF106). The rest of the stuff is pretty simple to grab but these two don't have class identifiers or even a td identifier. I think I just need to know how to use the value I have in $id as the specific span id I am looking for? If so can someone show me how to do that?

2 Answers 2

2

This:

$html->getElementByTagName('span')->getElementById($id);

makes no sense. getElementByTagName returns a DOMList, which does not have a getElementById method.

I think you mean $html->getElementById($id);, but I can't be sure because I don't know what $html is.

Once you have the element, you can get the text value with $element->textContent if you don't need to walk among the text nodes.

Have you considered using DOMXPath for your parsing task? It's probably much easier and clearer.

Sign up to request clarification or add additional context in comments.

5 Comments

I'd avoid the statement about DOMXPath being easier, not to mention about it being cleaner. It is more powerful, but easier? Huh...
Yea i figured that line wasnt going to do what i wanted, it was a last attempt getting it. And $html is the html of whatever site i need... " $html = file_get_html($fp);" and yea i did look into xpath a little and it didnt seem easier, but im going to try your suggestion now, thanks
@Tom, I think XPath is both easier and clearer. Using the DOM is a mess for anything more complex than getElementById.
@user1070764, is $html really just a string? You need to load that into a DOMDocument! How is any of your other parsing working?
@francisAvilla, about $html i guess so, after trying DomDocument and xpath a few different ways and it not working with what i was doing, i found simple_html_dom.php which worked like a charm without any examples of or need for a DOMDoc. On another note your solution worked, thank you, i didnt even need the textContent line so it was just that one line, i really was over thinking it. thanks again
0

Simple Html Dom should be avoided unless you're using Php version <= 4. The built in Dom functions in Php5 use the much more reliable libxml2 library.

The proper way to iterate that html is to first identify the rows to iterate and then write xpath expressions to pull the data relative to that row.

$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DomXpath($dom);

foreach($xpath->query("//tr[@class='sectionRow']") as $row){
    echo $xpath->query(".//span[contains(@id,'Days')]",$row)->item(0)->nodeValue."\n";
    echo $xpath->query(".//span[contains(@id,'Room')]",$row)->item(0)->nodeValue."\n";
    echo $xpath->query(".//span[contains(@id,'Status')]",$row)->item(0)->nodeValue."\n";
}

1 Comment

Thanks for that.. for now i just want this to work because its a small part of a bigger project, but i am going to want to optimize it so thanks for this example.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.