Regular Expression in Html Dom Parser/PHP

Question

source code:

<div id="point">9</div>
<div id="point">REAL POINT: 9</div>

and parser code:

$point = $html->find('div[id=point]');

so, when you write $point[0] it will be first, and the other will be second.

But sometimes i need to make an algorithm like this: "find the divs with id point and must begin with REAL POINT: "

We can find

$point = $html->find('div[id=point]')->innertext=' REAL POINT:';

But that finds only divs include ' REAL POINT:'

But i have to find divs innertext begin 'REAL POINT:'

How can i find?

You shouldn't have multiple elements with the same id, they are supposed to be unique. I suggest you use <div class="point"> instead. Even better <div class="point"> and <div class="realpoint"> . — gen_Eric
– gen_Eric, Commented Aug 16, 2011 at 15:21

Aram Kocharyan · Accepted Answer · 2011-08-16 15:23:06Z

1

You could use stripos for case sensitive.

foreach($html->find('div[id=point]') as $element) {
    if ( strpos($element->innertext, 'REAL POINT:') !== FALSE ) {
        // something here
    }
}

You could also do a search for the string exactly at the start:

foreach($html->find('div[id=point]') as $element) {
    if ( strpos($element->innertext, 'REAL POINT:') === 0 ) {
        // something here
    }
}

But if you want to remove whitespace before the first character in a div:

foreach($html->find('div[id=point]') as $element) {
    if ( strpos(trim($element->innertext), 'REAL POINT:') === 0 ) {
        // something here
    }
}

answered Aug 16, 2011 at 15:23

Aram Kocharyan

20.5k11 gold badges84 silver badges98 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Benjamin Over a year ago

This is semi answer Aram. I wish there was ->innertext^='REAL POINT'; But it doesn't work.

Aram Kocharyan Over a year ago

Take a look at simplehtmldom.sourceforge.net and the examples that come with the download. They seem to foreach loops a fair bit, but yeah, they should have provided something like that.

Francois Deschenes · Accepted Answer · 2011-08-16 15:23:32Z

0

Use DOMDocument and DOMXPath:

Example (http://codepad.org/pkdd3Suz):

<?php

$html = <<<END
<html>
    <head>
        <title>Sample</title>
    </head>
    <body>
        <div id="point">9</div>
        <div id="point">REAL POINT: 9</div>
    </body>
</html>
END;

$doc = new DOMDocument;
$doc->loadHTML($html);

$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//div[@id="point" and starts-with(., "REAL POINT:")]');

if ( $nodes )
    foreach ( $nodes as $node )
        echo $node->textContent . PHP_EOL;

answered Aug 16, 2011 at 15:23

Francois Deschenes

25.1k4 gold badges67 silver badges61 bronze badges

2 Comments

Álvaro González Over a year ago

This code works fine but not with this HTML sample, which is invalid and triggers a warning: ID point already defined

Aram Kocharyan Over a year ago

probably a better class than simple html dom parser

Marc B · Accepted Answer · 2011-08-16 15:27:05Z

0

Using XPath:

//div[@id='point' and starts-with(., 'REAL POINT:')]

edited Aug 16, 2011 at 15:27

answered Aug 16, 2011 at 15:21

Marc B

362k44 gold badges433 silver badges508 bronze badges

2 Comments

Francois Deschenes Over a year ago

This answer is incorrect. The starts-with function requires at least 2 arguments. Source: http://www.w3.org/TR/xpath-functions/#func-starts-with.

Benjamin Over a year ago

Yes and it is not simple html dom parser's element.

Collectives™ on Stack Overflow

Regular Expression in Html Dom Parser/PHP

3 Answers 3

2 Comments

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related