I need to perform some modifications to PHP files (PHTML files to be exact, but they are still valid PHP files), from a Bash script. My original thought was to use sed or similar utility with regex, but reading some of the replies here for other HTML parsing questions it seems that there might be a better solution.
The problem I was facing with the regex was a lack of support for detecting if the string I wanted to match: (src|href|action)=["']/ was in <?php ?> tags or not, so that I could then either perform string concatenation if the match was in PHP tags, or add in new PHP tags should it not be. For example:
(1) <img id="icon-loader-small" src="/css/images/loader-small.gif" style="vertical-align:middle; display:none;"/>
(2) <li><span class="name"><?php echo $this->loggedInAs()?></span> | <a href="/Login/logout">Logout</a></li>
(3) <?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?><span><?php echo $watched_dir->getDirectory();?></span></span><span class="ui-icon ui-icon-close"></span>
(EDIT: 4) <form method="post" action="/Preference/stream-setting" enctype="application/x-www-form-urlencoded" onsubmit="return confirm('<?php echo $this->confirm_pypo_restart_text ?>');">
In (1) there a src="/css, and as it is not in PHP tags I want that to become src="<?php echo $baseUrl?>/css. In (2), there is a PHP tag but it is not around the href="/Login, so it also becomes href="<?php echo $baseUrl?>/Login.
Unfortunately, (3) has src='/css but inside the PHP tags (it is an echoed string). It is also quoted by " in the PHP code, so the modification needs to pick up on that too. The final result would look something like: src='".$baseUrl."/css.
All the other modifications to my HTML and PHP files have been done using a regex (I know, I know...). If regexes could support matching everything except a certain pattern, like [^(<\?php)(\?>)]* then I would be flying through this part. Unfortunately it seems that this is Type 2 grammar territory. So - what should I use?
Ideally it needs to be installed by default with the GNU suite, but other tools like PHP itself or other interpreters are fine too, just not preferred. Of course, if someone could structure a regex that would work on the above examples, then that would be excellent.
EDIT: (4) is the nasty match, where most regexes will fail.