0

I always encounter regular expressions but I don't really try to understand and use them. But my current project is forcing me to use a regular expression so I need someone who can give me the correct regex to replace a simple string. Basically I'm replacing a small subset of longtext retrieved from a database. The longtext is just a paragraph(s) with text anchors in a form of:

<a href="example.com" title="blah3x">Example</a> 

So the question is how do I replace the value of the title attribute? Please note that the text may contain two more anchor tags so I'd like to able to specifically target each of them.

EDIT: I'd like to use pure PHP on this. I think I know how to do this using js/jquery.

2
  • @Barmar: I didn't build the project. I know it's really bad designed which is basically the reason I want to use regex.. Commented Jun 1, 2013 at 3:31
  • 1
    See John Conde's answer, you can use functions specifically for parsing HTML rather than a regexp. It has nothing to do with the design of the system. Commented Jun 1, 2013 at 3:34

3 Answers 3

4
$doc = new DOMDocument();
$doc->loadHTML('<a href="example.com" title="blah3x">Example</a>');
$anchors = $doc->getElementsByTagName('a');
foreach ($anchors as $anchor)
{
    $anchor->setAttribute('target', '__blank');
}
$html = $doc->saveHTML();

echo $html;

See it in action

Sign up to request clarification or add additional context in comments.

3 Comments

While this is certainly the preferred method in most cases, I don't think it's right to call it the correct way. For example: if I had to parse 10 billion pages I would opt to use regex or more likely even strpos.
See question my current project is forcing me to use a regular expression so I need someone who can give me the correct regex to replace a simple string.
I think this might work. However, I can't access anchor objects without iterating. Why can't I use $anchors[0]->setAttribute()?
4

Description

You could do this with the following regex

(<a\b[^>]*?\btitle=(['"]))(.*?)\2

enter image description here

Summary

  • ( start capture group 1
  • <a\b consume open angle bracket and an a followed by a word break
  • [^>]*? consume all non close angle bracket characters up to... this forces the regex to stay inside the anchor tag
  • \btitle= consume a word break and title=, the break helps do some additional checking
  • (['"]) capture group 2, ensure the an open single or double quote is being used
  • ) close capture group 1
  • (.*?) start capture group 3, and non greedy consume to collect all text inside the quotes
  • \2 reference back to the string from capture group 2, if you used a single quote to open the value, then a single quote will be required to close the value. Same if you had use a double quote.

In the replace command I'm simply replacing the entire found string from <a to the close quote with: group capture 1, followed by the desired text NewValue followed by the close quote from group capture 2.

PHP example

<?php
$sourcestring="<a href="example.com" title="blah3x">Example</a>";
echo preg_replace('/(<a\b[^>]*?\btitle=([\'"]))(.*?)\2/im','\1NewValue\2',$sourcestring);
?>

$sourcestring after replacement:
<a href="example.com" title="NewValue">Example</a>

Disclaimer

Since parsing text via a html parser is not the desired solution, I'll skip the usual soap box disclaimer about parsing html with Regex.

Comments

1
$string=preg_replace(
'@<a (.*)title="(.*)"([^>]*)>(.*)</a>@iU',
'<a $1title="'.$replacement.'"$3>$4</a>',
$string);

Note that the i at the end of the expression makes it case insensitive, and the U makes it ungreedy.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.