10

I'm trying to use regex to replace source attribute (could be image or any tag) in PHP.

I've a string like this:

$string2 = "<html><body><img src = 'images/test.jpg' /><img src = 'http://test.com/images/test3.jpg'/><video controls="controls" src='../videos/movie.ogg'></video></body></html>";

And I would like to turn it into:

$string2 = "<html><body><img src = 'test.jpg' /><img src = 'test3.jpg'/><video controls="controls" src='movie.ogg'></video></body></html>";

Heres what I tried :

$string2 = preg_replace("/src=["']([/])(.*)?["'] /", "'src=' . convert_url('$1') . ')'" , $string2);
echo htmlentities ($string2);

Basically it didn't change anything and gave me a warning about unescaped string.

Doesn't $1 send the content of the string ? What is wrong here ?

And the function of convert_url is from an example I posted here before :

function convert_url($url)
{
    if (preg_match('#^https?://#', $url)) {
        $url = parse_url($url, PHP_URL_PATH);
    }
    return basename($url);
}

It's supposed to strip out url paths and just return the filename.

5
  • the original string and what you want to turn it into are both empty strings -- is something missing? Commented May 18, 2012 at 19:04
  • 1
    You really shouldn't parse HTML with regex. You should find a pretty comprehensive answer as to why if you search SO. In the meantime, may I suggest DOM or SimpleXML Commented May 18, 2012 at 20:34
  • i mean try to replace in the regex all the " into \" but not the first and the last Commented May 19, 2012 at 4:02
  • possible duplicate of Grabbing the href attribute of an A element Commented May 23, 2012 at 6:08
  • Also, if you want to use regex and want to use a function in the replacement, you need preg_replace_callback. You cannot do convert_url('$1') like you do because that is evaluated before $1 exists. Commented May 23, 2012 at 6:12

3 Answers 3

14

Don't use regular expressions on HTML - use the DOMDocument class.

$html = "<html>
           <body>
             <img src='images/test.jpg' />
             <img src='http://test.com/images/test3.jpg'/>
             <video controls='controls' src='../videos/movie.ogg'></video>
           </body>
         </html>";

$dom = new DOMDocument;  
libxml_use_internal_errors(true);

$dom->loadHTML( $html ); 
$xpath = new DOMXPath( $dom );
libxml_clear_errors();

$doc = $dom->getElementsByTagName("html")->item(0);
$src = $xpath->query(".//@src");

foreach ( $src as $s ) {
  $s->nodeValue = array_pop( explode( "/", $s->nodeValue ) );
}

$output = $dom->saveXML( $doc );

echo $output;

Which outputs the following:

<html>
  <body>
    <img src="test.jpg">
    <img src="test3.jpg">
    <video controls="controls" src="movie.ogg"></video>
  </body>
</html>
Sign up to request clarification or add additional context in comments.

5 Comments

The dom document class is not very helpful if it is html embedded inside another HTML tag like <script></script> for e.g.
@Ashesh I'm not I follow. You showed us PHP code - I'm showing you the solution.
Well I'm sorry I should have been more clear. Here's what I'm talking about: "<html><head><script>var html = '<img src = /images/test.jpg/>'</script></head><body></html>". In this case, the domdocument would not pickup on the image tag inside the javascript. That's why I need to use regex.
@Ashesh The code above will work on the PHP string you have provided here. It converts the src elements to point only to the filename.
Sometimes it's not a good idea to load HTML parser. Especcialy on a short predefined text values (e.g. <img alt="smth" src="smwhr"/>), where only src="" and alt="" could vary.
2

You have to use the e modifier.

$string = "<html><body><img src='images/test.jpg' /><img src='http://test.com/images/test3.jpg'/><video controls=\"controls\" src='../videos/movie.ogg'></video></body></html>";

$string2 = preg_replace("~src=[']([^']+)[']~e", '"src=\'" . convert_url("$1") . "\'"', $string);

Note that when using the e modifier, the replacement script fragment needs to be a string to prevent it from being interpreted before the call to preg_replace.

Comments

1
function replace_img_src($img_tag) {
    $doc = new DOMDocument();
    $doc->loadHTML($img_tag);
    $tags = $doc->getElementsByTagName('img');
    foreach ($tags as $tag) {
        $old_src = $tag->getAttribute('src');
        $new_src_url = 'website.com/assets/'.$old_src;
        $tag->setAttribute('src', $new_src_url);
    }
    return $doc->saveHTML();
}

1 Comment

what is $img_tag ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.