how to match specific text link with php regex

Question

here I'm looking for a regular expression in PHP which would match the anchor with a specific "target="_parent" on it.I would like to get anchors with text like:

preg_match_all('<a href="http://" target="_parent">Text here</a>', subject, matches, PREG_SET_ORDER);

HTML:

<a href="http://" target="_parent">

    <FONT style="font-size:10pt" color=#000000 face="Tahoma">
        <DIV><B>Text</B> - Text </DIV>
    </FONT>

</a>

</DIV>

Tim Groeneveld · Accepted Answer · 2014-02-09 07:18:41Z

2

To be honest, the best way would be not to use a regular expression at all. Otherwise, you are going to be missing out on all kinds of different links, especially if you don't know that the links are always going to have the same way of being generated.

The best way is to use an XML parser.

<?php

$html = '<a href="http://" target="_parent">Text here</a>';
function extractTags($html) {
    $dom = new DOMDocument;
    libxml_use_internal_errors(true);
    $dom->loadHTML($html); // because dom will complain about badly formatted html
    $sxe = simplexml_import_dom($dom);
    $nodes = $sxe->xpath("//a[@target='_parent']");

    $anchors = array();
    foreach($nodes as $node) {
        $anchor = trim((string)dom_import_simplexml($node)->textContent);
        $attribs = $node->attributes();
        $anchors[$anchor] = (string)$attribs->href;
    }

    return $anchors;
}

print_r(extractTags($html))

This will output:

Array (
    [Text here] => http://
)

Even using it on your example:

$html = '<a href="http://" target="_parent">

<FONT style="font-size:10pt" color=#000000 face="Tahoma">
        <DIV><B>Text</B> - Text </DIV>
            </FONT>

            </a>

            </DIV>
            ';
            print_r(extractTags($html));

will output:

Array (
    [Text - Text] => http://
)

If you feel that the HTML is still not clean enough to be used with DOMDocument, then I would recommend using a project such as HTMLPurifier (see http://htmlpurifier.org/) to first clean the HTML up completely (and remove unneeded HTML) and use the output from that to load into DOMDocument.

edited Feb 9, 2014 at 7:18

answered Feb 9, 2014 at 6:44

Tim Groeneveld

9,0694 gold badges48 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Ahmed iqbal Over a year ago

i have already idea about DOM, but i need regex to handle these matches, html markup is not okay to handle by DOM.

Tim Groeneveld Over a year ago

@user1218948 even with your example, this code still works. You are going to need to provide a larger example of this failing before regex even thinks of becoming an acceptable solution :P

Ahmed iqbal Over a year ago

Not only, target='_parent' its grab target='_blank' as well

Tim Groeneveld Over a year ago

@user1218948 then change the xpath query to: $nodes = $sxe->xpath("//a[@target='_blank' or @target='_parent']");

Ahmed iqbal Over a year ago

$nodes = $sxe->xpath('//a[@target="_parent"]'); i used double quotation, it's fixed, So we have no longer, html parse issue in any bad format?

|

Shankar Narayana Damodaran · Accepted Answer · 2014-02-09 06:40:43Z

2

You should be making using DOMDocument Class instead of Regex. You would be getting a lot of false positive results if you handle HTML with Regex.

<?php

$html='<a href="http://" target="_parent">Text here</a>';
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $tag) {
    if ($tag->getAttribute('target') === '_parent') {
       echo $tag->nodeValue;
    }
}

OUTPUT :

Text here

answered Feb 9, 2014 at 6:40

Shankar Narayana Damodaran

68.6k43 gold badges102 silver badges129 bronze badges

3 Comments

Ahmed iqbal Over a year ago

html markup is too ugly, and DOM not capable to handle misleading tags.

Shankar Narayana Damodaran Over a year ago

You need to put up your HTML source so all could have a look.. If you think DOM is worse for this situation, Regex would be even more worse !

Shankar Narayana Damodaran Over a year ago

@user1218948, What is your expected output ?

Collectives™ on Stack Overflow

how to match specific text link with php regex

2 Answers 2

8 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related