10

I'm trying to write a regular expression that matches the (non-javadoc) comments in the format

/*
 * (non-javadoc)
 *
 * some other comment here
 *
 */

So far I have (?s)/\*\R.*?non-Javadoc.*?\*/, but that is actually matching too much. I have a header at the top of my file that is something like

/*
 * header text
 */
 public class MyClass {

 }

and it is matching the /* at the top of the file, but I really only want to match the generated (non-javadoc) comment. Can anyone help me fix up this regex?

EDIT: I'm trying to use the Eclipse Find/Replace dialog, but I am open to using external tools if needed.

3
  • Don't know, which language that is, but it may be PHP and in this case, use the Tokenizer. Commented May 31, 2011 at 23:11
  • @KingCrunch, it is using the Eclipse find/replace Commented May 31, 2011 at 23:18
  • (+1) for the question, as I am also just hunting down those ugly zero-information code-rubbish in Eclipse... :( Commented Mar 19, 2015 at 11:27

2 Answers 2

11

This should do it:

(?s)/\*[^*](?:(?!\*/).)*\(non-javadoc\)(?:(?!\*/).)*\*/

/\*[^*] matches the beginning of a C-style comment (/* */) but not a JavaDoc comment (/** */)

(?!\*/). matches any single character unless it's the beginning of a */ sequence. Searching for (?:(?!\*/).)* instead of .*? makes it impossible for a match to start in one comment and end in another.

UPDATE: In (belated) response to the comment by Jacek: yes, you'll probably want to add something to the end of the regex so you can replace it with an empty string and not leave a lot of blank lines in your code. But Jacek's solution is more complicated than it needs to be. All you need to add is \s*

The \R escape sequence matches many kinds of newline, including the Unicode Line Separator (\u2028) and Paragraph Separator (\u2029) and the DOS/network carriage-return+linefeed sequence (\r\n). But those are all whitespace characters, so \s matches them (in Eclipse, at least; according to the docs, it's equivalent to [\t\n\f\r\p{Z}]).

The \s* in Jacek's addition was only meant to match whatever horizontal whitespace (spaces or tabs) might exist before the newline, plus the indentation following it. (You have to remove it because you're not removing the indentation before the first line of the comment.) But it turns out \s* can do the whole job:

(?s)/\*[^*](?:(?!\*/).)*\(non-javadoc\)(?:(?!\*/).)*\*/\s*
Sign up to request clarification or add additional context in comments.

2 Comments

It would be also good to add (\s*)\R(\s*) at the end of the above regular expression to include the following line break and indentation characters. Otherwise a replacement with empty string will leave an empty line in place of each comment.
Someone felt strongly enough about this to edit Jacek's addition into my regex. My thanks to the reviewers who rejected that edit, and thank you, @Tom, for bringing this incomplete answer back to my attention.
0

In Perl, it would look like

/
   \/\*
   (?: (?! \*\/ ) . )*
   non-javadoc
   (?: (?! \*\/ ) . )*
   \*\/
/sx

1 Comment

I should have specified it is using the Eclipse find/replace

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.