1

I'm having trouble coming up with a regex that will match the javadoc comment contents for a specific java method. Example:

/**
 * Do not match this.
 */

/**
 * Do match this.
 */
@SomeAnnotation
public boolean methodX() { }
/**
 * Do not match this.
 */

I already know the method signature so I can use that in the regex.

I can match all of the javadoc comments using:

/\*\*(.*?)\*/

I'm also specifying re.DOTALL. I tried expanding the regex to use a negative lookahead that says I only want a javadoc comment if it's the comment immediately proceeding the method:

/\*\*(.*?)\*/(?!.*?/\*\*.*?public boolean methodX\(\))

But that's causing the (.*?) to match the contents from the start of the first javadoc comment to the end of the javadoc comment immediately proceeding methodX.

I keep trying various ways of constructing positive and negative lookaheads but nothing is working. What am I missing?

2
  • Are you looking to capture the annotation as well? And are you trying to capture and use the text portion of the comment that you match? Commented Jul 10, 2014 at 18:50
  • I just need the text portion of the comment, but I can strip the *'s and extra newlines as a post-processing step. The text is fairly structured so I'm not too worried about weird edge cases. Most of the methods have several annotations so I just wanted to make sure that was communicated in the example. Commented Jul 10, 2014 at 19:54

2 Answers 2

2

This matches the comment (from /** to */) preceding the function in the given example text in a comment named group:

(?P<comment>/\*\*(?:(?!/\*\*).)*?\*/)(?:(?:(?!\*/).)*?)(?=public boolean methodX)

See a test at regex101.com.

  • The key here is to ignore the extra /** and */ in the wanted text using (?!/\*\*).)*? and (?!\*/).)*?

  • ?:s are to scrape the uninteresting groups from the result

Sign up to request clarification or add additional context in comments.

3 Comments

This has some odd behavior during a fringe situation: where an annotation might contain */. This regex grabs the entire annotation up to that point as a part of the comment, when it should not.
@wolffer-east I see, how does such annotation look like? Could you provide an example please?
The following would have '@SomeAnnotation that contains ' as part of the comment capture: @SomeAnnotation that contains */ is a troublesome annotation. If you follow the link in famousgarkin's post and add */ to the end of the annotation you will see the capture change.
1

Your expression is greedy and is currently matching the */ in the first comment (because .* matches */). try using

/\*\*((?:[^*]+|\*[^/])*)\*/

This ensures that you will never match the ending */ by accident and end up with two comments matched at the same time

EDIT: This code avoids the issue of annotations that contain */. not sure why they would, but here goes:

/\*\*((?:(?!\*/).)*)\*/(?:(?!/\*\*).)*(?=public boolean methodX)

check out this example for confirmation that it works: http://regex101.com/r/yV9oK2/2 I switched from my original match to a negative lookahead to avoid a 'catastrophic backtrack' as the test program put it :)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.