0

I have following HTML

<table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">8</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/THEDJMHA/following">
          <div class="statnum">13</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/THEDJMHA/followers">
          <div class="statnum">22</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>

I want to get value from <td class="stat stat-last"> => <div class="statnum"> = 22.

I have tried the follow regex but does not any found match.

/<div\sclass="statnum">^(.)\?<\/div>/ig
4
  • 2
    Enable error_reporting. Niether the /g flag nor the ^ anchor would work there. And the escaped \? is misplaced as well. A typical placeholder is (.*?). -- But if you're that unversed with regexp: the off-topic answer to your question would be to use a DOM traversal frontend (such as qp($html)->find(".statnum"), or plain DOMDocument if you'd prefer tedious and brittle). Commented Aug 20, 2015 at 12:34
  • 1
    I think you shouldn't use ^ in that place.. Try this /<div\s+class="statnum">([^>]+)<\/div>/ig. Commented Aug 20, 2015 at 12:43
  • Anyway, that's not a good idea to parse HTML with regexps. You always will find a new bug. Commented Aug 20, 2015 at 12:47
  • If I'm not wrong then, what actually you needed over here is the text content of div i.e. 8,13,22 Commented Aug 20, 2015 at 12:50

4 Answers 4

3

Here's a way to accomplish this using a parser.

<?php
$html = '<table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">8</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/THEDJMHA/following">
          <div class="statnum">13</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/THEDJMHA/followers">
          <div class="statnum">22</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>';
$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$tds = $doc->getElementsByTagName('td');
foreach ($tds as $cell) { //loop through all Cells
    if(strpos($cell->getAttribute('class'), 'stat-last')){
        $divs = $cell->getElementsByTagName('div');
        foreach($divs as $div) { // loop through all divs of the cell
            if($div->getAttribute('class') == 'statnum'){
                echo $div->nodeValue;
            }
        }
    }
}

Output:

22

...or using an xpath...

$doc = new DOMDocument(); //make a dom object
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$statnums = $xpath->query("//td[@class='stat stat-last']/a/div[@class='statnum']");
foreach($statnums as $statnum) {
    echo $statnum->nodeValue;
}

Output:

22

or if you realllly wanted to regex it...

<?php
$html = '<table class="profile-stats">
  <tr>
    <td class="stat">
      <div class="statnum">8</div>
      <div class="statlabel"> Tweets </div>
    </td>
    <td class="stat">
        <a href="/THEDJMHA/following">
          <div class="statnum">13</div>
          <div class="statlabel"> Following </div>
        </a>
    </td>
    <td class="stat stat-last">
        <a href="/THEDJMHA/followers">
          <div class="statnum">22</div>
          <div class="statlabel"> Followers </div>
        </a>
    </td>
  </tr>
</table>';
preg_match('~td class=".*?stat-last">.*?<div class="statnum">(.*?)<~s', $html, $num);
echo $num[1];

Output:

22

Regex demo: https://regex101.com/r/kM6kI2/1

Sign up to request clarification or add additional context in comments.

Comments

2

I think it would be better if you use an XML parser for that instead of regex. SimpleXML can do the job for you: http://php.net/manual/en/book.simplexml.php

2 Comments

And how do one get the value from the node with the software you suggested?
HTML is a specific XML, so it will work with HTML. The SimpleXMLElement class will contain all data related with the node.
2
/<td class="stat stat-last">.*?<div class="statnum">(\d+)/si

Your match is in the first capture group. Notice the use of the s option at the end. Makes '.' match new line characters.

Comments

1

You can edit your pattern like that:

/<div\sclass="statnum">(.*?)<\/div>/ig

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.