10

I found this regex code that finds comments in w3.org's CSS grammar page.

\/\*[^*]*\*+([^/*][^*]*\*+)*\/

It is quite long and bit difficult to understand. I'd just put

\/\*.*\*\/

to find comments, but when I tested it in RegexPal it finds single line comments and not multi-line comments whereas the original regex can find all types of comments.

I don't understand what the

+([^/*][^*]*\*+)*

part inside the original regex does. Can anyone explain me this?

3
  • 3
    Regular Expression Analyzer Commented Feb 17, 2012 at 14:08
  • @kev That link will be very useful for me in future.. Thanks a lot. :) Commented Feb 20, 2012 at 9:49
  • @Vigneshwaran Since answering, I've found Regexper which I often turn to when I need to get insight into a regexp. See my answer below for a link with a breakdown of the regex in your question. Commented Feb 12, 2016 at 11:42

2 Answers 2

16

Token by token explanation:

\/    <- an escaped '/', matches '/'
\*    <- an escaped '*', matches '*'
[^*]* <- a negated character class with quantifier, matches anything but '*' zero or more times
\*+   <- an escaped '*' with quantifier, matches '*' once or more
(     <- beginning of group 
[^/*] <- negated character class, matches anything but '/' or '*' once
[^*]* <- negated character class with quantifier, matches anything but '*' zero or more times
\*+   <- escaped '*' with quantifier, matches '*' once or more
)*    <- end of group with quantifier, matches group zero or more times
\/    <- an escaped '/', matches '/'

Regex Reference

Analysis on Regexper.com

Sign up to request clarification or add additional context in comments.

1 Comment

love the break down of what each part means, you don't get this that often
6

The reason yours finds only single line comments is that, in typical regular expressions, . matches anything except newlines; whereas the other one uses a negated character class which matches anything but the specified characters, and so can match newlines.

However, if you were to fix that (there's usually an option for multiline or "as if single line" matching), you would find that it would match from the /* of the first comment to the */ of the last comment; you would have to use a non-greedy quantifier, .*?, to match no more than one comment.

However, the more complex regular expression you give is even more complex than that. Based on nikc.org's answer, I believe it is to enforce the restriction that “comments may not be nested”; that is, they must not contain /* within them. In other languages which permit comments /* like /* this */ (that is, an internal /* is neither prohibited nor a nested comment), the pattern \/\*.*?\*\/ would be appropriate to match them.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.