0

I want to remove the comments from these kind of scripts:

var stName = "MyName"; //I WANT THIS COMMENT TO BE REMOVED
var stLink = "http://domain.com/mydomain";
var stCountry = "United State of America";

What is (the best) ways of accomplish this using PHP?

5
  • It depends. But if you are using Laravel, for instance, you can configure a filter to perform these actions (also to minimize the js, uglify it, etc) Commented May 1, 2015 at 12:08
  • Not sure why you'd want to do this instead of using something like uglify-css[npmjs.com/package/uglifycss] Commented May 1, 2015 at 12:18
  • The best way is subjective. PHP, in my opinion, is not the best way to remove the comment from those three specific lines, just use a text editor. I know, you're going to say that these just represent your code and you want to solve this the world-over but that's not what you asked. However, to remove comments from every JavaScript block on the planet, use someone else's framework that does minification. Commented May 1, 2015 at 13:43
  • Updated the script to be more descriptive, updated the question comments. Commented May 4, 2015 at 7:07
  • How about a regex like preg_replace( '/\/\/[^\n]?/', '\n', $phpcode ) or preg_replace( '/\/\*(.*)\*\//', '', $phpcode )? Notice: I never tried this and this is just an idea how it could be done. Commented May 4, 2015 at 7:24

2 Answers 2

2

The best way is to use an actual parser or write at least a lexer yourself.
The problem with Regex is that it gets enormously complex if you take everything into account that you have to.
For example, Cagatay Ulubay's suggested Regex'es /\/\/[^\n]?/ and /\/\*(.*)\*\// will match comments, but they will also match a lot more, like

var a = '/* the contents of this string will be matches */';
var b = '// and here you will even get a syntax error, because the entire rest of the line is removed';
var c = 'and actually, the regex that matches multiline comments will span across lines, removing everything between the first "/*" and here: */';
/*
   this comment, however, will not be matched.
*/

While it is rather unlikely that strings contain such sequences, the problem is real with inline regex:

var regex = /^something.*/; // You see the fake "*/" here?

The current scope matters a lot, and you can't possibly know the current scope unless you parse the script from the beginning, character for character.
So you essentially need to build a lexer.
You need to split the code into three different sections:

  • Normal code, which you need to output again, and where the start of a comment could be just one character away.
  • Comments, which you discard.
  • Literals, which you also need to output, but where a comment cannot start.

Now the only literals I can think of are strings (single- and double-quoted), inline regex and template strings (backticks), but those might not be all.
And of course you also have to take escape sequences inside those literals into account, because you might encounter an inline regex like

/^file:\/\/\/*.+/

in which a single-character based lexer would only see the regex /^file:\/ and incorrectly parse the following /*.+ as the start of a multiline comment.
Therefore upon encountering the second /, you have to look back and check if the last character you passed was a \. The same goes for all kinds of quotes for strings.

Sign up to request clarification or add additional context in comments.

Comments

0

I would go with preg_replace(). Assuming all comments are single line comments (// Comment here) you can start with this:

$JsCode = 'var stName = "MyName isn\'t \"Foobar\""; //I WANT THIS COMMENT TO BE REMOVED
var stLink = "http://domain.com/mydomain"; // Comment
var stLink2 = \'http://domain.com/mydomain\'; // This comment goes as well
var stCountry = "United State of America"; // Comment here';

$RegEx = '/(["\']((?>[^"\']+)|(?R))*?(?<!\\\\)["\'])(.*?)\/\/.*$/m';
echo preg_replace($RegEx, '$1$3', $JsCode);

Output:

var stName = "MyName isn't \"Foobar\""; 
var stLink = "http://domain.com/mydomain"; 
var stLink2 = 'http://domain.com/mydomain'; 
var stCountry = "United State of America"; 

This solution is far from perfect and might have issues with strings containing "//" in them.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.