2

I have the following regular expression to remove multi-line comment but I am having a hard time trying to figure out how to remove comments starting with //.

When I add (//.*) as the regular expression it never seems to work.

 pattern = r"""
                        ##  --------- COMMENT ---------
       /\*              ##  Start of /* ... */ comment
       [^*]*\*+         ##  Non-* followed by 1-or-more *'s
       (                ##
         [^/*][^*]*\*+  ##
       )*               ##  0-or-more things which don't start with /
                        ##    but do end with '*'
       /                ##  End of /* ... */ comment
                        ##
        |               ## --------- COMMENT ---------
         (//.*)         ## Start of // comment
                        ##
     |                  ##  -OR-  various things which aren't comments:
       (                ##
                        ##  ------ " ... " STRING ------
         "              ##  Start of " ... " string
         (              ##
           \\.          ##  Escaped char
         |              ##  -OR-
           [^"\\]       ##  Non "\ characters
         )*             ##
         "              ##  End of " ... " string
       |                ##  -OR-
                        ##
                        ##  ------ ' ... ' STRING ------
         '              ##  Start of ' ... ' string
         (              ##
           \\.          ##  Escaped char
         |              ##  -OR-
           [^'\\]       ##  Non '\ characters
         )*             ##
         '              ##  End of ' ... ' string
       |                ##  -OR-
                        ##
                        ##  ------ ANYTHING ELSE -------
         .              ##  Anything other char
         [^/"'\\]*      ##  Chars which doesn't start a comment, string
       )                ##    or escape

"""

Could some one please tell me where am i going wrong ? I even tried the following regular expression:

//[^\r\n]*$

but that doesn't work either.

9
  • As I remember, there is a little trick to this. Commented Aug 20, 2014 at 21:12
  • In C/C++, string literal's can hide anything, including comments. That's why the quote style regex segments are there. They have to be matched first to move the match position past them and get at a possible comment. So, these regex segments can't be excluded. Example code: char *p = "Hello /* World */ // EOS"; Commented Aug 22, 2014 at 17:51
  • I tried your code, but I dont know I am going wrong somewhere :( it just returns me the whole thing as is, and you are right the above regex fixes the basic things but not the string literals or anything else. Could you please let me know where am i going wrong ? I used your bigger code the one which starts with (?: #Comments... Commented Aug 22, 2014 at 19:03
  • I'll fix up my sample below. Are you just trying to remove ALL comments ? Commented Aug 22, 2014 at 19:26
  • YES !!!! and i have been trying every possible thing but some how some or the other thing goes wrong ! All i am trying to do is take a c file scan it put all it's comments in one file and the normal code in another file. Commented Aug 22, 2014 at 19:58

1 Answer 1

1

Try one of these...

They both capture comments and non-comments.


This one does Not preserve formatting and uses no modifiers.
From a find while loop, store Group 1 (comments) in a new file,
replace with Group 2 (non-comments) in the original file.
Adjust the regex line break as necessary. Ie. Change \n to \r\n etc...

   # (/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\]*)


   (                                # (1 start), Comments 
        /\*                              # Start /* .. */ comment
        [^*]* \*+
        (?: [^/*] [^*]* \*+ )*
        /                                # End /* .. */ comment
     |  
        //                               # Start // comment
        (?: [^\\] | \\ \n? )*?           # Possible line-continuation
        \n                               # End // comment
   )                                # (1 end)
|  
   (                                # (2 start), Non - comments 
        "
        (?: \\ [\S\s] | [^"\\] )*        # Double quoted text
        "
     |  '
        (?: \\ [\S\s] | [^'\\] )*        # Single quoted text
        ' 
     |  [\S\s]                           # Any other char
        [^/"'\\]*                        # Chars which doesn't start a comment, string, escape,
                                         # or line continuation (escape + newline)
   )                                # (2 end)

Last Rework -
Does a much better job preserving formatting.
The formatting problem pertaining to newlines is addressed from the comment tail.
While this fixes the problem of string concatenation it does leave an occasional blanked
line where the comment was. For %98 of the comments, this won't be an issue.
But, time to leave this dead dog alone.

This one preserves formatting. It uses the regex modifier Multi-Line (be sure to set that).
Do the same as above.
This assumes your engine supports \h horizontal tab. If not let me know.
Adjust the regex line break as necessary. Ie. Change \n to \r\n etc...

   #  ((?:(?:^\h*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:\h*\n(?=\h*(?:\n|/\*|//)))?|//(?:[^\\]|\\\n?)*?(?:\n(?=\h*(?:\n|/\*|//))|(?=\n))))+)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\\s]*)

   (                                # (1 start), Comments 
        (?:
             (?: ^ \h* )?                     # <- To preserve formatting
             (?:
                  /\*                              # Start /* .. */ comment
                  [^*]* \*+
                  (?: [^/*] [^*]* \*+ )*
                  /                                # End /* .. */ comment
                  (?:
                       \h* \n                                      
                       (?=                              # <- To preserve formatting 
                            \h*                              # <- To preserve formatting
                            (?: \n | /\* | // )              # <- To preserve formatting
                       )
                  )?                               # <- To preserve formatting
               |  
                  //                               # Start // comment
                  (?: [^\\] | \\ \n? )*?           # Possible line-continuation
                  (?:                              # End // comment
                       \n                               
                       (?=                              # <- To preserve formatting
                            \h*                              # <- To preserve formatting
                            (?: \n | /\* | // )              # <- To preserve formatting
                       )
                    |  (?= \n )
                  )
             )
        )+                               # Grab multiple comment blocks if need be
   )                                # (1 end)

|                                 ## OR

   (                                # (2 start), Non - comments 
        "
        (?: \\ [\S\s] | [^"\\] )*        # Double quoted text
        "
     |  '
        (?: \\ [\S\s] | [^'\\] )*        # Single quoted text
        ' 
     |  [\S\s]                           # Any other char
        [^/"'\\\s]*                      # Chars which doesn't start a comment, string, escape,
                                         # or line continuation (escape + newline)
   )                                # (2 end)
Sign up to request clarification or add additional context in comments.

2 Comments

THANK YOU !!!!!!!!!!!! Oh my god I cannot thank you enough for this !!! I tried the first one and it preserves the formatting too !!! I don't know how to express my gratitude but you are my hero !! Seriously Thank you! was busting my self on this question from a week ! Thank you so so very much !!!!!!
Oh, no problem, glad you got something going. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.