2

I need to perform some modifications to PHP files (PHTML files to be exact, but they are still valid PHP files), from a Bash script. My original thought was to use sed or similar utility with regex, but reading some of the replies here for other HTML parsing questions it seems that there might be a better solution.

The problem I was facing with the regex was a lack of support for detecting if the string I wanted to match: (src|href|action)=["']/ was in <?php ?> tags or not, so that I could then either perform string concatenation if the match was in PHP tags, or add in new PHP tags should it not be. For example:

(1) <img id="icon-loader-small" src="/css/images/loader-small.gif" style="vertical-align:middle; display:none;"/>
(2) <li><span class="name"><?php echo $this->loggedInAs()?></span> | <a href="/Login/logout">Logout</a></li>
(3) <?php echo ($watched_dir->getExistsFlag())?"":"<span class='ui-icon-alert'><img src='/css/images/warning-icon.png'></span>"?><span><?php echo $watched_dir->getDirectory();?></span></span><span class="ui-icon ui-icon-close"></span>
(EDIT: 4) <form method="post" action="/Preference/stream-setting" enctype="application/x-www-form-urlencoded" onsubmit="return confirm('<?php echo $this->confirm_pypo_restart_text ?>');">

In (1) there a src="/css, and as it is not in PHP tags I want that to become src="<?php echo $baseUrl?>/css. In (2), there is a PHP tag but it is not around the href="/Login, so it also becomes href="<?php echo $baseUrl?>/Login. Unfortunately, (3) has src='/css but inside the PHP tags (it is an echoed string). It is also quoted by " in the PHP code, so the modification needs to pick up on that too. The final result would look something like: src='".$baseUrl."/css.

All the other modifications to my HTML and PHP files have been done using a regex (I know, I know...). If regexes could support matching everything except a certain pattern, like [^(<\?php)(\?>)]* then I would be flying through this part. Unfortunately it seems that this is Type 2 grammar territory. So - what should I use? Ideally it needs to be installed by default with the GNU suite, but other tools like PHP itself or other interpreters are fine too, just not preferred. Of course, if someone could structure a regex that would work on the above examples, then that would be excellent.

EDIT: (4) is the nasty match, where most regexes will fail.

3
  • 1
    This is one the main reasons why presentation layer should be mixed up with application logic. Commented Oct 29, 2012 at 9:47
  • I'm sure you meant: shouldn't** Commented Oct 29, 2012 at 13:29
  • It hard to tell what you really want to do, but if it involves parsing arbitrary PHP to find what you want, regexps simply won't do it. PHP has a context-free grammar, and regexe can't parse those. You can always give up on reliable parsing and just live with a regex and the occasional breakage; that's an decision you have to make but most of the time this path ends badly as it keeps coming back to bite you. If regexps aren't it, you need a real PHP parser. Commented Nov 2, 2012 at 21:47

1 Answer 1

3

The way I solved this problem was by separating my file into sections that were encapsulated by . The script kept track of the 'context' it was currently in - by default set to html but switching to php when it hit those tags. An operation (not necessarily a regex) then performs on that section, which is then appended to the output buffer. When the file is completely processed the output buffer is written back into the file.

I attempted to do this with sed, but I faced the problem of not being able to control where newlines would be printed. The context based logic was also hardcoded meaning it would be tedious to add in a new context, like ASP.NET support for example. My current solution is written in Perl and mitigates both problems, although I am having a bit of trouble getting my regex to actually do something, but this might just be me coding my regex incorrectly.

Script is as follows:

#!/usr/bin/perl -w

use strict;

#Prototypes
sub readFile(;\$);
sub writeFile($);

#Constants
my $file;
my $outputBuffer;
my $holdBuffer;
# Regexes should have s and g modifiers
# Pattern is in $_
my %contexts = (
    html => {
        operation => ''
    },
    php => {
        openTag => '<\?php |<\? ', closeTag => '\?>', operation => ''
    },
    js => {
        openTag => '<script>', closeTag => '<\/script>', operation => ''
    }
);
my $currentContext = 'html';
my $combinedOpenTags;

#Initialisation
unshift(@ARGV, '-') unless @ARGV;
foreach my $key (keys %contexts) {
    if($contexts{$key}{openTag}) {
        if($combinedOpenTags) {
            $combinedOpenTags .= "|".$contexts{$key}{openTag};
        } else {
            $combinedOpenTags = $contexts{$key}{openTag};
        }
    }
}

#Main loop
while(readFile($holdBuffer)) {
    $outputBuffer = '';
    while($holdBuffer) {
        $currentContext = "html";
        foreach my $key (keys %contexts) {
            if(!$contexts{$key}{openTag}) {
                next;
            }
            if($holdBuffer =~ /\A($contexts{$key}{openTag})/) {
                $currentContext = $key;
                last;
            }
        }
        if($currentContext eq "html") {
            $holdBuffer =~ s/\A(.*?)($combinedOpenTags|\z)/$2/s;
            $_ = $1;
        } else {
            $holdBuffer =~ s/\A(.*?$contexts{$currentContext}{closeTag}|\z)//s;
            $_ = $1;
        }
        eval($contexts{$currentContext}{operation});
        $outputBuffer .= $_;
    }
    writeFile($outputBuffer);
}

# readFile: read file into $_
sub readFile(;\$) {
    my $argref = @_ ? shift() : \$_;
    return 0 unless @ARGV;
    $file = shift(@ARGV);
    open(WORKFILE, "<$file") || die("$0: can't open $file for reading ($!)\n");
    local $/;
    $$argref = <WORKFILE>;
    close(WORKFILE);
    return 1;
}

# writeFile: write $_[0] to file
sub writeFile($) {
    open(WORKFILE, ">$file") || die("$0: can't open $file for writing ($!)\n");
    print WORKFILE $_[0];
    close(WORKFILE);
}

I hope that this can be used and modified by others to suit their needs.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.