8

I know this question is around SO, but I can't find the right one and I still suck in Regex :/

I have an string and that string is valid HTML. Now I want to find all the tags with an certain name and attribute.

I tried this regex (i.e. div with type): /(<div type="my_special_type" src="(.*?)<\/div>)/.

Example string:

<div>Do not match me</div>
<div type="special_type" src="bla"> match me</div>
<a>not me</a>
<div src="blaw" type="special_type" > match me too</div>

If I use preg_match then I only get <div type="special_type" src="bla"> match me</div> what is logical because the other one has the attributes in a different order.

What regex do I need to get the following array when using preg_match on the example string?:

array(0 => '<div type="special_type" src="bla"> match me</div>',
      1 => '<div src="blaw" type="special_type" > match me too</div>')
2

2 Answers 2

20

A general advice: Dont use regex to parse HTML It will get messy if the HTML changes..

Use DOMDocument instead:

$str = <<<EOF
<div>Do not match me</div>
<div type="special_type" src="bla"> match me</div>
<a>not me</a>
<div src="blaw" type="special_type" > match me too</div>
EOF;

$doc = new DOMDocument();
$doc->loadHTML($str);    
$selector = new DOMXPath($doc);

$result = $selector->query('//div[@type="special_type"]');

// loop through all found items
foreach($result as $node) {
    echo $node->getAttribute('src');
}
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for the quick answer! I accepted the other question because that one looped through the found nodes.
@Sven Know that looping in PHP is slow. Using XPath is definitely they way to go here - especially if the HTML is larger than shown in question. test it
Can you give me an example how to loop through the found items from $result? Because I can't find an method like items() or something and i have multiple tags in the html that i want to find.
@Sven OHHH. this means I've ruined another post of mine O: (edited the wrong answer accidently) ...
@Sven No problem! :) I'm happy that you've finally accepted the XPath solution. Not because of the 15 reps I got, it's because XPath is the way to go here. If you get fluent with XPath you'll see that you can select everything that comes in mind - and more! ;) But there are also situations where XPath has disadvantages (from performance sight of things) that's especially if you select by node name or id attr.. in this cases pure DOM would perform better
|
5

As hek2msql said, you better use DOMDocument

$html = '
<div>Do not match me</div>
<div type="special_type" src="bla"> match me</div>
<a>not me</a>
<div src="blaw" type="special_type" > match me too</div>';

$matches = get_matched($html);


function get_matched($html){
    $matched = array();

    $dom = new DOMDocument();
    @$dom->loadHtml($html);

    $length = $dom->getElementsByTagName('div')->length;

    for($i=0;$i<$length;$i++){
        $type = $dom->getElementsByTagName("div")->item($i)->getAttribute("type");

        if($type != 'special_type')
            continue;

        $matched[] = $dom->getElementsByTagName("div")->item($i)->getAttribute('src');
    // or   $matched[] = $dom->getElementsByTagName("div")->item($i)->nodeValue;

    }

    return $matched;

}

3 Comments

That's bollocks. You really suggest to iterate over nodes instead of using XPATH?? Seems that you don't understood what hek2mgl said
i never said "instead of using XPATH" ofcourse he could use XPATH.
If the hmtl is like this $html = ' <div>Do not <strong>match</strong> me</div> then it remove the strong tag.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.