I'm trying to parse a CTP file (CakePHP template with HTML and PHP tags in it) and want to match all the HTML tags with specific data-attributes (data-edit="true"). Each tag with data-edit="true" MUST have a data-type="..." and data-name="..." attribute. I would like to capture these attributes in (named) groups, so I can use them in my code. So far I have the following regex:
\<(?<tagname>\w+).*?(?>data\-edit="true").*?\>(?<content>.*?)\<\/(?&tagname)\>
Here are some samples of the tags it should match:
<h4 data-type="text" data-edit="true" data-name="SomeName">Some content, with or without newlines.</h4>
and
<span data-edit="true" data-type="wysiwyg" data-name="Beoordeling">Some text
with <strong>tags</strong> and newlines in it that
should not break the parser.</span>
From the above examples I would like the regex to return the content of the data-type and data-name tag, and of course the content (between the tags) itself.
The data-attributes can occur in whatever order and it is possible other attributes are present in the tags (such as classes). So far I've managed to get the content of only the tags with a data-edit="true" attribute, but when has a newline, the match breaks. Also I can't capture the other data-attributes.
It is even possible what I want to achieve? I know regex isn't the preferred way to parse HTML, but as this is a CTP file with all kinds of other tags in it, I can't use an XML parser.
Edit: sample code: https://regex101.com/r/nF6a96/2
<br>) (thoughDOMDocumentcan cope with that) or there's embedded PHP<?php ... ?>).