7

I am having difficulty trying to get this regex to work. All I am trying to do is remove block comments. This is what I have so far but I can't get rid of the final */.

$string = 'this is a test /*asdfa  */ ok then';

$pattern = '/\/\*([^\*\/]*)/i';

$replacement = '';

echo preg_replace($pattern, $replacement, $string);

//this is a test */ ok then

Any help will be appreciated.

23
  • 2
    PHP isn't a regular language, so it's impossible parse it or to remove all valid block comments with a regex. Commented Nov 17, 2010 at 17:12
  • 2
    @Paul: You cannot parse the PHP with regex, but you can lexically analyze it just fine. One doesn't need a full blown parser to get rid of comments (indeed, usually the comments are thrown out in lexical analysis, not parsing) Commented Nov 17, 2010 at 17:15
  • 1
    @Billy No you can't. en.wikipedia.org/wiki/Chomsky_hierarchy Commented Nov 17, 2010 at 17:16
  • 1
    I'm the first to discourage regexes for tasks they can't handle. But C-style comment can be recognized (and e.g. stripped) by regular expressions, because they cannot nest (this is the same in PHP and quite a few others). /* a /* b */ echo 'see?'; */ will out "see?" (or rather, the parser rejects it because of the final */, which still proves the point. The SO syntax highlighter gets this right btw. Commented Nov 17, 2010 at 17:19
  • 2
    See this example " this is a comment /* or is it? */". Should the comment inside the string be removed? Want to make it more complicated? Bring in some heredoc. Commented Nov 17, 2010 at 17:21

7 Answers 7

6

Use a different delimiter than / -- it makes it confusing.

How about '#/\*.+?\*/#s';

Sign up to request clarification or add additional context in comments.

4 Comments

Should there be an m modifier in there though? (disclaimer I'm a little rusty on multiline regexes here.)
You actually need s (DOT_ALL), not m.
@Billy - thank you. This is one I can understand and actually works for cases when you want to get rid of /*** a */ - some of there answers didn't work for this case. In addition, I think its much simpler to use this than the tokenizer.
@Abs: 1. Thank you, and 2. To be fair, the tokenizer is going to be more accurate. (But then, you already knew that ;) )
6

token_get_all and build it back without T_COMMENTs. I don't think anything more should be said.

8 Comments

Everybody loves tokenizers. +1 also for the second sentence.
Overkill solution is overkill. Regular expressions are sufficient and more appropriate for this use case. (not that the tokenizer wouldn't use them internally as well)
@mario Do you know how the tokenizer works? RegEx is not magic... so don't assume you can use them anywhere.
@Alin Purcaru: I made no assertions to anywhere, but specifically said "this use case". And yes, I do know a few things about them, and I've also written a few tokenizers.
@mario Using some regular expressions under certain conditions to find the tokens is different from using a regular expression.
|
6

Try this as your pattern:

/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/

6 Comments

I don't understand... what's \r\n doing in there?
It removes new lines. May not be needed for the example he gave, but that's what I use to remove multi-line comments.
@wajiw: Why not just make it match any character? I don't see how newlines are special.
I've run into the problem where .* doesn't match new lines
Yeah that's a smarter way of doing it :-) Thanks
|
1

I'm using this (note that you only need the first line for /*...*/ comments):

  #-- extract /* ... */ comment block
  #  or lines of #... #... and //... //...
  if (preg_match("_^\s*/\*+(.+?)\*+/_s", $src, $uu)
  or (preg_match("_^\s*((^\s*(#+|//+)\s*.+?$\n)+)_ms", $src, $uu))) {
     $src = $uu[1];
  }
  // Public Domain, not CC

Works quite well. But like all regex solutions, it would fail on the $PHP = "st/*rings" edge case.

2 Comments

Do not say "like all regex solutions", because you are wrong. Say only that those presented here are too primitive for the feat.
@tchrist: My bad. In trying to righten the regex slander, I've overgeneralized it myself.
0

Running preg_replace twice with pattern /\*|\*/ should work.

1 Comment

That only gets rid of the comment delimiters, not the comment text itself.
0

To just fix your main pattern, I can tell you that your not matching the final "*/" because you are missing it from your pattern.

Following your own pattern, try this little modification:

'/\/\*([^\*\/]*)**\*\/**/i'

I also suggest you to use different delimitators to make the pattern more read-friendly.

#/\*([^\*/]*)\*/#i

Comments

0

Maybe:

$pattern = '/\/\*([.]*)\*\//i';

Please don't down-rate as this is a quick guess trying to help... :)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.