-1

I currently go trought all my source files and read their text with File.ReadAllLines and i want to filter all comments with one regex. Basically all comment possiblities. I tried several regex solutions i found on the internet. As this one:

@"(@(?:""[^""]*"")+|""(?:[^""\n\\]+|\\.)*""|'(?:[^'\n\\]+|\\.)*')|//.*|/\*(?s:.*?)\*/"

And the top result when i google:

string blockComments = @"/\*(.*?)\*/";
string lineComments = @"//(.*?)\r?\n";
string strings = @"""((\\[^\n]|[^""\n])*)""";
string verbatimStrings = @"@(""[^""]*"")+";

See: Regex to strip line comments from C#

The second solution won't recognize any comments.

Thats what i currently do

public static List<string> FormatList(List<string> unformattedList, string dataType)
{
    List<string> formattedList = unformattedList;

    string blockComments = @"/\*(.*?)\*/";
    string lineComments = @"//(.*?)\r?\n";
    string strings = @"""((\\[^\n]|[^""\n])*)""";
    string verbatimStrings = @"@(""[^""]*"")+";

    string regexCS = blockComments + "|" + lineComments + "|" + strings + "|" + verbatimStrings;
    //regexCS = @"(@(?:""[^""]*"")+|""(?:[^""\n\\]+|\\.)*""|'(?:[^'\n\\]+|\\.)*')|//.*|/\*(?s:.*?)\*/";
    string regexSQL = "";

    if (dataType.Equals("cs"))
    {
        for(int i = 0; i < formattedList.Count;i++)
        {
            string line = formattedList[i];
            line = line.Trim(' ');

            if(Regex.IsMatch(line, regexCS))
            {
                line = "";
            }

            formattedList[i] = line;
        }
    }
    else if(dataType.Equals("sql"))
    {

    }
    else
    {
        throw new Exception("Unknown DataType");
    }

    return formattedList;
}

The first Method recognizes the comments, but also finds things like

string[] bla = text.Split('\\\\');

Is there any solution to this problem? That the regex excludes the matches which are in a string/char? If you have any other links i should check out please let me know!

I tried a lot and can't figure out why this won't work for me.

[I also tried these links]

https://blog.ostermiller.org/find-comment

https://codereview.stackexchange.com/questions/167582/regular-expression-to-remove-comments

Regex to find comment in c# source file

2
  • Using Rosalyn would be a better choice than regex. Commented Jun 24, 2019 at 11:30
  • Regexs are poor for analysing code. In the code of the question: (1) Multi-line comments will not be handled by blockComments because it is called from code that is effectively foreach (individualLine in formattedList) .... (2) Double forward-slashes in strings will break lineComments, e.g. the line stringVar = "abc//def";. (3) There is nothing to stop verbatimStrings being processed as strings. (4, 5, 6, and more) There are several other problems. Perhaps you need to rethink the whole problem and its possible solutions. Commented Jun 24, 2019 at 11:54

1 Answer 1

0

Doing this with regexes will be very difficult, as stated in the comments. However, a fine way to eliminate comments would be by utilizing a CSharpSyntaxWalker. The syntaxwalker knows about all language constructs and won't make hard to investigate mistakes (as regexes do).

Add a reference to the Microsoft.CodeAnalysis.CSharp Nuget package and inherit from CSharpSyntaxWalker.

class CommentWalker : CSharpSyntaxWalker
{
    public CommentWalker(SyntaxWalkerDepth depth = SyntaxWalkerDepth.Node) : base(depth)
    {
    }

    public override void VisitTrivia(SyntaxTrivia trivia)
    {
        if (trivia.IsKind(SyntaxKind.MultiLineCommentTrivia)
            || trivia.IsKind(SyntaxKind.SingleLineCommentTrivia))
        {
            // Do something with the comments
            // For example, find the comment location in the file, so you can replace it later.
            // Make a List as a public property, so you can iterate the list of comments later on.
        }
    }
}

Then you can use it like so:

// Get the program text from your .cs file
SyntaxTree tree = CSharpSyntaxTree.ParseText(programText);
CompilationUnitSyntax root = tree.GetCompilationUnitRoot();

var walker = new CommentWalker();
walker.Visit(root);

// Now iterate your list of comments (probably backwards) and remove them.

Further reading:

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.