2

Is it possible to exclude <pre> tags from this code igniter compression hook? I don't understand regular expressions well enough to not break my page. I have tried, but it always jacks up the output.

EDIT: This CodeIgniter Compression hook strips all unecisary white space and formatting from the code in order to compress the output. Including <pre> tags that rely on that spacing and formatting to display the code right.

I'm trying to show code examples in a compressed output page.

<?php  if ( ! defined('BASEPATH')) exit('No direct script access allowed');

function compress()
{
    $CI =& get_instance();
    $buffer = $CI->output->get_output();

     $search = array(
        '/\n/',
        '/\>[^\S ]+/s',
        '/[^\S ]+\</s',
        '/(\s)+/s'
      );

     $replace = array(
        ' ',
        '>',
        '<',
        '\\1'
      );

    $buffer = preg_replace($search, $replace, $buffer);

    $CI->output->set_output($buffer);
    $CI->output->_display();
}

?>

1 Answer 1

4

Let's start by looking at the code you're using now.

 $search = array(
    '/\n/',
    '/\>[^\S ]+/s',
    '/[^\S ]+\</s',
    '/(\s)+/s'
  );

 $replace = array(
    ' ',
    '>',
    '<',
    '\\1'
  );

The intention appears to be to convert all whitespace characters to simple spaces, and to compress every run of multiple spaces down to one. Except it's possible for carriage-returns, tabs, formfeeds and other whitespace characters to slip through, thanks to the \\1 in the fourth replacement string. I don't think that's what the author intended.

If that code was working for you (aside from matching inside <pre> elements), this would probably work just as well, if not better:

$search = '/(?>[^\S ]\s*|\s{2,})/`;

$replace = ' ';

And now we can add a lookahead to prevent it from matching inside <pre> elements:

$search = 
  '#(?>[^\S ]\s*|\s{2,})(?=(?:(?:[^<]++|<(?!/?pre\b))*+)(?:<pre>|\z))#`;

But really, this is not the right tool for the job you're doing. I mean, look at that monster! You'll never be able to maintain it, and complicated as it is, it's still nowhere near as robust as it should be.

I was going to urge you to drop this approach and use a dedicated HTML minifier instead, but that one seems to have its own problems with <pre> elements. If that problem has been fixed, or if there's another minifier out there that would meet your needs, you should definitely go that route.


EDIT: In response to a comment, here's a version that excludes <textarea> as well as <pre> elements:

$search = 
  '#(?ix)
    (?>[^\S ]\s*|\s{2,})
    (?=
      (?:(?:[^<]++|<(?!/?(?:textarea|pre)\b))*+)
      (?:<(?>textarea|pre)\b|\z)
    )
    #'
Sign up to request clarification or add additional context in comments.

4 Comments

that is a monster, but it works for my small application, thanks a ton you regex ninja.
@Alan, i tried to include <textarea> along with <pre> but couldn't quite figure out where to add textarea tag in this regx
@Alan, aah so tags need to be added along with OR operator, thanx a lot, worked like magic :)
@Alan a person name ridgerunner did extensive analysis of your regx and suggested some improvements, you might be interested in. stackoverflow.com/questions/5312349/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.