1

This expression only gets the values between angle brackets > < when they are numeric. I want to get them in any case.

function GetProducts($file){
    $regex = "|class=\"producto\"[^>]+>([0-9]*)</[^>]+>|U";
    if(!is_file($file)) return false;
    preg_match_all($regex,file_get_contents($file), $result);
    foreach($result[1] as $key =>$value) $result[$key] = (int) $value;
    return $result;
}

This is my HTML code:

<a class="producto" href="ver.asp?id=4013">A86028</a></span><!-- /a --></td></tr>
    <a class="producto" href="ver.asp?id=4014">1027C</a></span><!-- /a --></td></tr>
    <a class="producto" href="ver.asp?id=4014">5611 4020</a></span>
<!-- /a --></td></tr>
    <a class="producto" href="ver.asp?id=4014">396-4185</a></span>
<!-- /a --></td></tr>
    <a class="producto" href="ver.asp?id=4014">834006-5-7</a></span>
<!-- /a --></td></tr>
    <a class="producto" href="ver.asp?id=4014">5601GR 4325GR</a></span>
<!-- /a --></td></tr>
    <a class="producto" href="ver.asp?id=4014">2182CR(2)</a></span>
<!-- /a --></td></tr>
    <a class="producto" href="ver.asp?id=4014">1458-54-63-55</a></span>
<!-- /a --></td></tr>

My desired output is:

Array ([1] => 1027 [2] => 5611 [3] => 5396 [4] => 834006 [5] => 5601 [6] => 2182 [7] => 1458) 
3
  • 6
    Don't parse HTML with regex! Commented Sep 11, 2014 at 20:29
  • 1
    what is your desired output? Commented Sep 11, 2014 at 20:38
  • Array ( [1] => 1027 [2] => 5611 [3] => 5396 [4] => 834006 [5] => 5601 [6] => 2182 [7] => 1458 ) Commented Sep 11, 2014 at 21:01

3 Answers 3

2

This might work, but as people say parsing html with regex is problematic.

 # class="producto"[^>]+>([^<]*)</[^>]+>

 class="producto" [^>]+ >
 ( [^<]* )
 </ [^>]+ >
Sign up to request clarification or add additional context in comments.

2 Comments

To quote the bountied answer of the very post that so berates HTML regex parsing, While it is true that asking regexes to parse arbitrary HTML is like asking Paris Hilton to write an operating system, it's sometimes appropriate to parse a limited, known set of HTML. And this is the case here.
Yeah, I could throw down a 15k regex to parse html and its still problematic. Especially entities and substitutions. I rationalize this pertains even to a known set of html.
1

You've asked for a pure regular expression here, but it's not the right tool for parsing HTML.

function _matcher ($m, $str) {
  if (preg_match('/^\d+/', $str, $matches))
    $m[] = $matches[0];
  return $m;
}

$dom = new DOMDocument;
$dom->loadHTML($html); 
$xpath = new DOMXPath($dom);

foreach ($xpath->query('//a[@class="producto"]') as $link) {
   $vals[] = $link->nodeValue;
}

print_r(array_reduce($vals, '_matcher', array()));

Output ( Working Demo )

Array
(
    [0] => 1027
    [1] => 5611
    [2] => 396
    [3] => 834006
    [4] => 5601
    [5] => 2182
    [6] => 1458
)

Comments

0

You can use a regex like this:

([\w\s-\(\)]+)</

Working demo

enter image description here

The idea is to capture alphanumeric, dashes and paretheses before your .

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.