0

Intro:

I'm fairly new to RegEx so bear with me here. We have a client who has an extremely large CSS file. Verging on 27k lines total - 20k lines or so is pure CSS and the following is written in SCSS. I am attempting to cut this down and despite using more than allotted hours to work on this, I found it extremely interesting - so I wrote a little PHP script to do this for me! Unfortunately it's not quite there due to the RegEx being a little troublesome.

Context

remove.txt - Text file containing selectors, line by line that are redundant on our site and can be removed. main.scss - The big SASS file. PHP script - Basically reads the remove.txt file line by line, finds the selector in the main.scss file and adds a "UNUSED" string before each selector, so I can go down line by line and remove the rule.

Issue

So the main reason this is troublesome is because we have to account for lots of occurrences at the start of the CSS rules and towards the end as well. For example -

Example scenarios of .foo-bar (bold indicates what should match) -

.foo-bar {}

.foo-bar, .bar-foo {}

.foo-bar .bar-foo {}

.boo-far, .foo-bar {}

.foo-bar,.bar-foo {}

.bar-foo.foo-bar {}

PHP Script

<?php 

$unused = 'main.scss';
if ($file = fopen("remove.txt", "r")) {

    // Stop an endless loop if file doesn't exist
    if (!$file) {
        die('plz no loops');
    }

    // Begin looping through redundant selectors line by line 
    while(!feof($file)) {

        $line = trim(fgets($file));

        // Apply the regex to the selector
        $line = $line.'([\s\S][^}]*})';

        // Apply the global operators
        $line = '/^'.$line.'/m';

        // Echo the output for reference and debugging
        echo ('<p>'.$line.'</p>');

        // Find the rule, append it with UNUSED at the start
        $dothings = preg_replace($line,'UNUSED $0',file_get_contents($unused), 1);

    }
    fclose($file);
} else {
    echo ('<p>failed</p>');
}
?>

RegEx

From the above you can gather my RegEx will be -

/^REDUNDANTRULE([\s\S][^}]*})/m

It's currently having a hard time with dealing with indentation that typically occur within media queries and also when there are proceeding selectors applied to the same rule.

From this I tried adding to the start (To accommodate for whitespace and when the selector is used in a longer version of the selector) -

^[0a-zA-Z\s]

And also adding this to the end (to accommodate for commas separating selectors)

\,

Could any RegEx/PHP wizards point me in the right direction? Thank you for reading regardless!

Thanks @ctwheels for the fantastically explained answer. I encountered a couple other issues, one being full stops being used within the received redundant rules not being escaped. I've now updated my script to escape them before doing the find an replace as seen below. This is now my most up to date and working script -

<?php 

$unused = 'main.scss';
if ($file = fopen("remove.txt", "r")) {
    if (!$file) {
        die('plz no loops');
    }
    while(!feof($file)) {

        $line = trim(fgets($file));
        if( strpos( $line, '.' ) !== false ) {
            echo ". found in $line, escaping characters";
            $line = str_replace('.', '\.', $line);
        }
        $line = '/(?:^|,\s*)\K('.$line.')(?=\s*(?:,|{))/m';
        echo ('<p>'.$line.'</p>');
        var_dump(preg_match_all($line, file_get_contents($unused)));
        $dothings = preg_replace($line,'UNUSED $0',file_get_contents($unused), 1);
        var_dump(
            file_put_contents($unused,
                $dothings
            )
        );
    }
    fclose($file);
} else {
    echo ('<p>failed</p>');
}
?>

1 Answer 1

1

Answer

Brief

Based on the examples you provided, the following regex will work, however, it will not work for all CSS rules. If you add more cases, I can update the regex to accommodate those other situations.


Code

See regex in use here

Regex

(?:^|,\s*)\K(\.foo-bar)(?=\s*(?:,|{))

Replacement

UNUSED $1

Note: The multiline m flag is used.


Usage

The following script is generated by regex101 (by clicking on code generator in regex101): Link here

$re = '/(?:^|,\s*)\K(\.foo-bar)(?=\s*(?:,|{))/m';
$str = '.foo-bar {}

.foo-bar, .bar-foo {}

.foo-bar .bar-foo {}

.boo-far, .foo-bar {}

.foo-bar,.bar-foo {}

.bar-foo.foo-bar {}';
$subst = 'UNUSED $1';

$result = preg_replace($re, $subst, $str);

echo "The result of the substitution is ".$result;

Results

Input

.foo-bar {}

.foo-bar, .bar-foo {}

.foo-bar .bar-foo {}

.boo-far, .foo-bar {}

.foo-bar,.bar-foo {}

.bar-foo.foo-bar {}

Output

UNUSED .foo-bar {}

UNUSED .foo-bar, .bar-foo {}

.foo-bar .bar-foo {}

.boo-far, UNUSED .foo-bar {}

UNUSED .foo-bar,.bar-foo {}

.bar-foo.foo-bar {}

Explanation

  • (?:^|,\s*) Match either of the following
    • ^ Assert position at the start of the line
    • ,\s* Comma character , literally, followed by any number of whitespace characters
  • \K Resets starting point of the reported match (any previously consumed characters are no longer included in the final match)
  • (\.foo-bar) Capture into group 1: The dot character . literally, followed by foo-bar literally
  • (?=\s*(?:,|{)) Positive lookahead ensuring what follows matches the following
    • \s* Any whitespace character any number of times
    • (?:,|{)) Match either of the following
      • , Comma character , literally
      • { Left curly bracket { literally


Edit

The following regex is an update from the previous one and moves \s* outside the first group to match the possibility of whitespace after the caret ^ as well.

(?:^|,)\s*\K(\.foo-bar)(?=\s*(?:,|{))
Sign up to request clarification or add additional context in comments.

3 Comments

This is a fantastic answer, thank you very much for taking time to do this. I have encountered a situation where this doesn't work however. I've updated regex101 here to demonstrate - regex101.com/r/gw46rn/2 The first example string should also be caught, however the indentation is throwing it off (within the SCSS file it's in a media query, hence the indentation)
@JoeCorby You're very welcome, I've added a change to the above answer (see edit). Updated the regex101 link: regex101.com/r/gw46rn/3
Thank you again! I made one final amend to check a redundant rule where there are multiple applicable selectors on one line for white-space between a comma and the next selector and everything is now working perfectly!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.