4

I have well-formed xml documents into string variables. I want to use preg_replace to add a defined attribute to every xml tags.

For example replace:

<tag1>
<tag2> some text </tag2>
</tag1>

by:

<tag1 attr="myAttr">
<tag2 attr="myAttr"> some text </tag2>
</tag1>

So I basically need the regex expression to find any start tags and add my attribute, but I'm a complete regex noob.

3 Answers 3

13

Don't use regular expressions for working on xml. Xml is not a regular language. Use the xml extensions of php instead:

$xml = new SimpleXml(file_get_contents($xmlFile));
function process_recursive($xmlNode) {
    $xmlNode->addAttribute('attr', 'myAttr');
    foreach ($xmlNode->children() as $childNode) {
        process_recursive($childNode);
    }
}
process_recursive($xml);
echo $xml->asXML();

All answers containing regular expressions will break this valid xml, for example:

<?xml version="1.0" encoding='UTF-8'?>
<html>
    <head>
        <!-- <meta> ... </meta> -->
        <script>//<![CDATA[
            function load() {document.write('<tt>Test</tt>');}
        //]]></script>
        <title><![CDATA[Fancy <<SiteName>> [with Breadcrumbs] > in > title]]></title>
    </head>
    <body onload="load()">
        <input
            type="submit"
            value="multiline
                   button
                   text"
        />
    </body>
</html>
Sign up to request clarification or add additional context in comments.

3 Comments

I understand the dirtiness in using regex for xml, but in my case I'll only try to add those attributes on 'regex safe' xml doc. Thank you for pointing this out!!
btw I was surprised by the few code required to do it with simpleXML, I tried your code but it adds a <attributes attr="myAttr"/> element just before the document's end tag, weird
ok I did some minor changes in that one to work for me, using addAttribute($name,$value) instead of attributes[] and in the foreach statement $xmlNode->children() needs parenthesis. thx again!
0
$xml_data = preg_replace("/<([^\/]+\w+)/", "<\\1 attr=\"myAttr\">", $xml_data);

2 Comments

arrrg it's almost doing the trick, excep that this adds 'attr="myAttr">' in the CDATA part of each nodes, but not as an attribute... any idea?
Yes, this is why people recommend not mixing regexes and XML, because of the corner cases and equivalent syntaxes. But don't worry, you're only going to use it on absolutely 100% legal and consistent XML, right?
0

OK, for those reading these lines and are still interested about using the regex way for some reasons, here is how to do it:

$xml_data= preg_replace('/(<[A-Za-z0-9\-\_]+[^>]*)>/u','\1 attr="myAttr">',$xmlData);

But, as discussed earlier, use that one with caution! Use it only on XML source that you know won't be broken (see soulmerge post about that)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.