1

I have a bunch of C# source files, which I need to analyze outsize of VS, and am struggling with a particular case:

    public static bool InsertNote(string TableName, string TableKey, string DocType, string InsuredKey, string SubmissionKey,
                                    string Staff, string DefaultAction, string DisplayKey, FileType FileType, string IFSFileName,
                                    string IFSFolder, string IFSTimeStamp, string Subject, string Notation, NoteType NoteType,
                                    //string Company, string NoteCategory, ref OracleConnection Connection)
                                    string Company, string NoteCategory, string DocumentName, ref SqlConnection Connection)
    {

I've thought that this RegEx should be able to find it:

    private static readonly Regex MethodNamesExtractor = new Regex(@"^.*(\S*)\({1}.*ref\s*SqlConnection", RegexOptions.Multiline | RegexOptions.Compiled);

But it does not. What am I missing?

9
  • It the method stub actually on multiple lines? Commented May 7, 2014 at 16:04
  • Yes, it is. Just like the sample above. if they are on same lines, it works just fine. Commented May 7, 2014 at 16:05
  • So does the regex not find anything or only find the letter e? Commented May 7, 2014 at 16:06
  • Not sure why it would find just the letter e. It finds nothing. The number of matches is 0. If the ref SqlConnection is on the same line, the first Group contains the method name. Commented May 7, 2014 at 16:08
  • What string are you trying to extract? Commented May 7, 2014 at 16:12

3 Answers 3

2

. by default does not match newlines. You might solve the problem with RegexOptions.Singleline:

private static readonly Regex MethodNamesExtractor = new Regex(@"^.*(\S*)\({1}.*ref\s*SqlConnection", RegexOptions.Multiline | RegexOptions.Singleline | RegexOptions.Compiled);

The Multiline option makes ^ and $ match at every beginning and end of a line respectively instead of matching the beginning and end of the whole string. It might be a little confusing, but that's how it is! And you can use an inline modifier which works just the same (?s). I'll use that in the subsequent regexes, and remove the Multiline mode since it's not being used.

But that's not the only problem. .* will not match greedily, meaning that it will match as much as possible, before \S* even has the chance to match something. You can fix this by making .* lazy, i.e. by adding a ? to it, or simply removing it, since it isn't doing much anyway. Also {1} is redundant, since repetition of once is the default quantifier. Also, the ^.* at the beginning isn't doing much You can safely remove it:

private static readonly Regex MethodNamesExtractor = new Regex(@"(?s)(\S*)\(.*ref\s*SqlConnection", RegexOptions.Compiled);

Now for the tricky part: if you are now trying to match several method names from many methods, the above regex will match only one. Let's say you are trying to get the method names from two methods, the first one doesn't have the req SqlConnection part while the second one does. Well, you get this.

To fix that, you might want to restrict .* to a negated class, by using [^)]*. You will notice that using this won't give you any match, and that's because of a commented part in the method which has a ) just before the req SqlConnection part appears. Well, you can allow for commented lines like this:

"(?s)(\S*)\((?:[^)]|//[^\r\n]*\))*ref\s*SqlConnection"

That's provided you don't have any 'false' double forward slashes or parens within the parameters. To allow comment blocks too, well, the regex will become longer, obviously... (and even longer if you want to allow parens within the parameters)

"(?s)(\S*)\((?:[^)]|//[^\r\n]*\)|/\*(?:(?!\*/).)*\*/)*ref\s*SqlConnection"

Well, conclusion, it might be better to use a dedicated parser to parse a programming language.

Sign up to request clarification or add additional context in comments.

2 Comments

I will try your recommendation for RegEx, thanks for quite elaborate answer. As for the parser ... All I am trying to do is to find in how many places the developers f-ed up, by passing an SqlConnection by reference, in a multi-threaded scenario.
@Darek Ah, I see. Good luck with that task, cleaning up behind those who messed up is never fun... Sorry it's a bit of a wall of text. Sometimes, my answers come out like that ^^;
0

You need to add a question mark after the star

private static readonly Regex MethodNamesExtractor = new Regex(@"^.*?(\S*)\({1}.*?ref\s*SqlConnection", RegexOptions.Singleline | RegexOptions.Compiled);

Otherwise star will act as a greedy quantifier.

http://regex101.com/r/qG5lD3

5 Comments

Let it be known that one Logan Murphy, shall be from now on, known as My new RegEx GURU.
Dear GURU. why does it find only the first occurrence?
Why does what only find the first occurrence?
Oh only one method name and not all of them? Again because of greed. I added another question mark that solves the problem.
it did not quite as well as I hoped, but after combining with my original pattern, I've got 99% of what I needed.
0

I think if you add RegexOptions.Singleline, it'll do what you want. Here it is on regex101.com

So try the following (translating on the fly from regex101 style definition.:

private static readonly Regex MethodNamesExtractor = new Regex(@"^.*(\S*)\({1}.*ref\s*SqlConnection", RegexOptions.Singleline | RegexOptions.Compiled);

Reason: Multiline has to do with how ^ & are interpreted. Singleline on the other hand says that . matches newline which is what you want since your test text is across multiple lines.

2 Comments

It finds one group and it is empty.
@Darek Of course... your group is (\S*) which is defined as "match any non-white space character [^\r\n\t\f ]" (quoting regex101). What do you expect to capture in your group?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.