1

I am failing to detect SQL in C# code by using Regex.

This is my regex string:

(?i)(?s)\b(select)\b(.*?)\b(from)\b|\b(insert)\b(.*?)\b(into)\b|\b(update)\b(.*?)\b(set)\b|\b(delete)(.*?)\b(from)\b

I want to match SQL keywords like the following:

... SELECT ... FROM ...
... INSERT ... INTO ...
... UPDATE ... SET ...
... DELETE ... FROM ...

I am using the following website to test my regular expressions:

http://regexstorm.net/tester

It works just like I expect it to work on the website, but for some random reason, this regex does not work if my code runs.

"Regex.Matches" does not find any matches for some reason.


The SQL can all be in one single line, or the SQL can stretch over multiple lines.

So, basically, I am giving my application a directory. This directory contains ".cs" files.

I then read the text in the files, and try to match the above Regex string.

Here is the code I have so far:

private string _strRegex = "(?i)(?s)\b(select)\b(.*?)\b(from)\b|\b(insert)\b(.*?)\b(into)\b|\b(update)\b(.*?)\b(set)\b|\b(delete)(.*?)\b(from)\b";

string lines = string.Empty;
bool foundMatch = false;
//Match the text to our regular expression, and see if we find a match anywhere.
foreach (Match match in Regex.Matches(strFile, _strRegex))
{
    //Get the line number this match occurred at in the file.
    var lineNumber = strFile.Take(match.Index).Count(c => c == '\n') + 1;
    //Get the actual line.
    int lineNum = 0;
    using (StringReader reader = new StringReader(strFile))
    {
        string line;
        while ((line = reader.ReadLine()) != null)
        {
            lineNum++; //First, increment!

            if (lineNum == lineNumber)
            {
                line = line.Trim();
                if (!line.StartsWith("//"))
                {
                    foundMatch = true;
                    lines += $@"    [Line {lineNumber}] - {line}{Environment.NewLine}";
                    break;
                }
            }
        }
    }
}

if (foundMatch)
{
    File.AppendAllText(_itemsLogPath, $@"{file}{Environment.NewLine}{lines}{Environment.NewLine}");
}
11
  • 1
    I assume your CS files do not contain any LINQ expressions, or you are going to get a whole load of false positives. Commented Jan 11, 2017 at 13:09
  • "detect SQL in C# code" - why? Anyway read How to Ask and provide a minimal reproducible example. Commented Jan 11, 2017 at 13:10
  • 1
    @Phylogenesis - Yes, I do have LINQ in the code, but it is not so much. Also, they wont match because the LINQ expressions is not in the same order as the SQL keyqoeds, eg: "from ... select" for LINQ, etc Commented Jan 11, 2017 at 13:12
  • @FrederikMoller Unless there are two LINQ expressions, and the select from the first is matched with the from in the second. Commented Jan 11, 2017 at 13:13
  • 2
    What happens if you put _strRegex = @".... eg a literal string? Commented Jan 11, 2017 at 13:19

2 Answers 2

3

The problem is the \b is being trying to translate into a c# string shortcut, not a regex code, you need to literalise the string.

private string _strRegex = @"(?i)(?s)\b(select)\b(.*?)\b(from)\b|\b(insert)\b(.*?)\b(into)\b|\b(update)\b(.*?)\b(set)\b|\b(delete)(.*?)\b(from)\b";

that poor little @ makes all the difference

Sign up to request clarification or add additional context in comments.

2 Comments

Almost a whole day trying to figure out why it does not work, and here you come and tell me to add @. Thanx :)!
I fell for it in the past :p
2

The SQL can all be in one single line, or the SQL can stretch over multiple lines. - I assume that the problem might be at this point.

Try using RegexOptions to specify this. It would look like this:

var regex = new Regex(@"...pattern", RegexOptions.IgnoreCase);

Edit 1

As mentioned by BugFinder, .. don't forget about the @ in front of your pattern string, declaring it as a literal string.

3 Comments

It it still not finding any matches using this :(. However, you did give me the option to specify the "ignore case" as regex options, I was putting it in the regex string :)
RegexOptions.Multiline is the wrong flag to use in this situation. It stops . from matching a newline character.
@Phylogenesis oh yes you're right, my bad. It's the other way around. I updated the answer accordingly

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.