PHP / HTML comments tags

Question

I have several HTML pages with codes that look like this:

<!-- ID: 123456 -->

What I need is a PHP script that can pull that ID number. I have tried the following:

if (preg_match('#^<!--(.*?)-->#i', $output)) {
                echo "A match was found.";
            } else {
                echo array_flip(get_defined_constants(true)['pcre'])[preg_last_error()];
                echo "No match found.";
            }`

That always gives "No match found", with no error reported. I have also tried the preg_match_all and the same results. The only thing I have found to work is to create an array based on spaces, but that is very time consuming and waste of processor power.

For reference, I have looked and tried just about every suggestion on these pages:

Explode string by one or more spaces or tabs

http://php.net/manual/en/function.preg-split.php

How to extract html comments and all html contained by node?

Maybe this is because - is a special symbol and should be escaped? — u_mulder
– u_mulder, Commented Sep 22, 2015 at 20:38
Remove the ^ from the pattern. Otherwise, it will match only at the start of the string. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Sep 22, 2015 at 20:40
$output is the string with  or the ID you want captured? Works here, eval.in/437735. Might need m modifier if you want the <! to be only at the start of each line. — chris85
– chris85, Commented Sep 22, 2015 at 20:51
@u_mulder - is not a special symbol, except inside square brackets. — Barmar
– Barmar, Commented Sep 22, 2015 at 21:06

XOR-Manik · Accepted Answer · 2015-09-22 20:53:43Z

1

How about try this:

<!-- ID: ([\w ]+) -->

This will search for all the literals mentioned in your example, and extract the numeric ID. You can fetch it with the help of numbered group.

PS:Use the escaping.

edited Sep 22, 2015 at 20:53

answered Sep 22, 2015 at 20:40

XOR-Manik

5031 gold badge4 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Wiktor Stribiżew Over a year ago

Here, only \w must be escaped.

XOR-Manik Over a year ago

Thanks I have updated it, I was trying the regex in java environment and forgot to remove the escape characters.

Casimir et Hippolyte · Accepted Answer · 2015-09-22 21:31:29Z

1

To extract informations from structured data (as HTML, XML, Json...) use the correct parser (DOMDocument and DOMXPath to query the DOM tree):

$html = <<<'EOD'
<script>var a='<!-- ID: avoid_this --> and that <!-- ID: 666 -->';</script>
blahblah<!-- ID: 123456 -->blahblah
EOD;

$query = '//comment()[starts-with(., " ID: ")]';

$dom = new DOMDocument;
$dom->loadHTML($html);
$xp = new DOMXPath($dom);

$nodeList = $xp->query($query);

foreach ($nodeList as $node) {
    echo substr($node->textContent, 5, -1);
}

Feel free to check the result after with is_numeric or a regex. You can register your own php function and include it in the xpath query too: http://php.net/manual/en/domxpath.registerphpfunctions.php

edited Sep 22, 2015 at 21:31

answered Sep 22, 2015 at 21:00

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

Comments

Aniruddha Chakraborty · Accepted Answer · 2015-09-22 20:59:41Z

-1

First think the HTML file as a Text file because you want to read only some text from the .html file.

test.html

<!DOCTYPE html>
<html>
<head>
    <title></title>
</head>
<body>
<p>This is a test HTML page<p>
<!-- ID: 123456 -->
</body>
</html>

PHP script that fetch ID from HTML file

<?php

$fileName = 'test.html';

$content = file_get_contents($fileName);
$start = '<!-- ID:';
$end   = '-->';
function getBetween($content,$start,$end){
    $r = explode($start, $content);

    if (isset($r[1])){

        $r = explode($end, $r[1]);
        return $r[0];

    }
    return '';
}


echo str_replace(' ', '', getBetween($content,$start,$end));


?>

answered Sep 22, 2015 at 20:59

Aniruddha Chakraborty

1,8621 gold badge22 silver badges33 bronze badges

1 Comment

Pekka Over a year ago

This is a very... original approach :) But it's really far better to use a proper XML/HTML parser as shown by Casimir above.

Collectives™ on Stack Overflow

PHP / HTML comments tags

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related