0

I am working on a personal project where I need to extract the actual comments from an input string like this.

Case 1: /* Some useful text */

Output: Some useful text

Case 2: /*** This is formatted obnoxiously**/

Output: This is formatted obnoxiously

Case 3:

    /**

    More useful
information

    */

Output: More useful information

Case 4:

/**
Prompt the user to type in 
the number. Assign the number to v
*/

Output: Prompt the user to type in the number. Assign the number to v

I am working in Java and I have tried to replace /* and */ using naive method such as String.replace but since a comment can be formatted in different ways like above, the replace method seems not to be a viable approach to do this. How can I achieve the above outputs using regex?

Here is the test comment file that I am using.

2 Answers 2

2

Try something like :

"/\\*+\\s*(.*?)\\*+/"

And dot should match also new lines:

Pattern p = Pattern.compile("/\\*+\\s*(.*?)\\*+/", Pattern.DOTALL);

EDIT

 Pattern p = Pattern.compile("/\\*+\\s*(.*?)\\*+/", Pattern.DOTALL); 
 Matcher m = p.matcher("/*** This is formatted obnoxiously**/");
 m.find();
 String sanitizedComment = m.group(1); 
 System.out.println(sanitizedComment);
Sign up to request clarification or add additional context in comments.

10 Comments

. doesn't match new-lines in Java (not by default anyway, not sure if there's a way to set that). You need (.|\n)
@Dukeling: There is a way to set it in Java (DOTALL option). It is not a good idea to write (.|\n), since you might miss out some characters. . excludes more than just \n in Java.
@Dukeling nhahtdh is right , i've updated my answer to show you how you can make dot to match new lines
@Stephan that didn't work. I got IllegalStateException because there were no matches. Pattern p = Pattern.compile("/\\*+\\s*(.*?)\\*+/", Pattern.DOTALL); Matcher m = p.matcher(matchedComment); String sanitizedComment = m.group(); System.out.println(sanitizedComment);
@Stephan, I did as you said, and everything works before the highlighted file in the code that O have uploaded here
|
1

You can use the following regex:

String newString = oldString.replaceAll("/\\*+\\s*|\\s*\\*+/", "");

EDIT

To also get rid of newlines you could do something like:

String regex = "/\\*+\\s*|\\s*\\*+/|[\r\n]+";
String newString = oldString.replaceAll(regex, "");

7 Comments

Awesome, It worked. Thanks! Now I have one more question, I am using the following escaped string for finding comments in the file. //.*|(\"(?:\\\\[^\"]|\\\\\"|.)*?\")|(?s)/\\*.*?\\*/ How can I make it so that it will only find /* ... */ comments and not single line comments ( // ... ) ?
Hmm, looks like it does not work for cases like this: /** Prompt the user to type in the number. Assign the number to v */ This thing not letting me blanks line I will update the question.
@NullGeo: To get rid of the newlines I would just add a .replaceAll(System.getProperty("line.separator"), "")
@Keppil: You should make a second pass to remove the line separators. And don't just remove them; you may end up running words together. What you want to do is normalize the remaining whitespace (e.g. .replaceAll("\\s+", " ");). As for the line.separator property, see this answer for a discussion of its disutility.
@AlanMoore: Sure, if line feeds need to be replaced by a space, then a second pass is needed. There might be other small adjustments that need to be made depending on OPs use case, but I think it is fairly trivial to tweak the code above to fit such extra demands though.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.