3

Is there any way to parse a Complex RegEx pattern(containing several named groups as well as several numbered groups and non-capturing groups) and report about each groupname or groupnumber along with pattern text.

Suppose, I do have a RegEx pattern like this:

  (?im)(?<x>\b[a-s03]+\b)(?-i)(?<a>\p{L}+?,(?<b>.+?:(?<c>.+?;(?<d>.+?(?:\d|sample-text|(\k'x'|sos30))))))

And I like to extract:=

  Named groups:
  x==>(?<x>\b[a-s03]+\b)
  a==>(?<a>\p{L}+?,(?<b>.+?:(?<c>.+?;(?<d>.+?(?:\d|sample-text|(\k'x'|sos30))))))
  b==>(?<b>.+?:(?<c>.+?;(?<d>.+?(?:\d|sample-text|(\k'x'|sos30)))))
  c==>(?<c>.+?;(?<d>.+?(?:\d|sample-text|(\k'x'|sos30))))
  d==>(?<d>.+?(?:\d|sample-text|(\k'x'|sos30)))

  Numbered groups:
  1==>(\k'x'|sos30)

  Non-capturing-groups:
  1st==>(?:\d|sample-text|(\k'x'|sos30))

Purpose of this Requirement:

I do have a large database of complex RegEx patterns. The previous programmar worked on this did not use any comment [(?#...)] while preparing these complex patterns, moreover no linebreaks exists within those patterns. I have to modify those patterns some cases and also have to use comment within those patterns. Now it is something like searching a needle in the haystakes. I simply could not use RegEx for this purpose. So, I inclined to use a parser for this case.

What I tried:

I tried GetGroupNames and GetGroupNumbers collection for that purpose. I could extract only the Names/Numbers of the groups, but not the corresponding textual patterns.

I am looking for a Non-RegEx solution/some hints.

4
  • I'm not aware of any pre existing solutions to parse regular expressions. Although it probably shouldn't be immensely difficult to make something like it yourself if you simply loop through the line, capture opening parenthesis and look for the corresponding closing parenthesis. Based on the textual pattern you could see what sort of group it is. Commented Nov 16, 2012 at 10:09
  • @Patrickdev: Thank you, for giving your time comment. Actually the example I described is a very simple one and there is no parenthesis at all, where as actual patterns are much more more complex(mostly with nested structures and lots of parenthesis with the escaped ones also). But, yes, I am agree with you for suggestion to prepare a new Parser. But, it would be something like re-inventing the wheel. Certainly I will think about it if no there's any other solution for that. Commented Nov 16, 2012 at 10:19
  • Have a look at regex101.com - it could be of help here. Commented Nov 16, 2012 at 10:54
  • @Lindrian: Thanks for your comment. But, I do require a RegEx parser through which I would be able to parse a very large numbers of Complex RegEx patterns. Personally I do use Regexbuddy for creating and editing regex patterns, which I believe the best tool one must have, very explanatory in this regard. But, I am not going to manual work for that. Moreover the site you mentioned is very much pythonic, I require a .NET compatible regex engine also. Commented Nov 16, 2012 at 11:25

2 Answers 2

3

How about this, for this:

(?im)(?<x>\b[a-s03]+\b)(?-i)(?<a>\p{L}+?,(?<b>.+?:(?'c'.+?;(.+?(?:\d|sample-text|(\k'x'|sos30))))))

This, as the Output:

(0)<0>:     (?im)(?<x>\b[a-s03]+\b)(?-i)(?<a>\p{L}+?,(?<b>.+?:(?'c'.+?;(.+?(?:\d|sample-text|(\k'x'|sos30))))))
(1)<x>:     \b[a-s03]+\b
(2)<a>:     \p{L}+?,(?<b>.+?:(?'c'.+?;(.+?(?:\d|sample-text|(\k'x'|sos30))))
(3)<b>:     .+?:(?'c'.+?;(.+?(?:\d|sample-text|(\k'x'|sos30)))
(4)<c>:     .+?;(.+?(?:\d|sample-text|(\k'x'|sos30))
(5)<5>:     .+?(?:\d|sample-text|(\k'x'|sos30)
(6)<6>:     \k'x'|sos30

This is the code:

Imports System.Collections.Specialized
Module Module1
Public DictGroups As New OrderedDictionary
Public DictTrackers As New Dictionary(Of Integer, Boolean)
Public intGroups As Integer = 0
Public CommandGroup As Boolean = False
Sub Main()
    Dim regexToEval As String = "(?im)(?<x>\b[a-s03]+\b)(?-i)(?<a>\p{L}+?,(?<b>.+?:(?'c'.+?;(.+?(?:\d|sample-text|(\k'x'|sos30))))))"
    Dim curChar As String = ""
    DictGroups.Add(0, "(0)<0>: " & vbTab)
    DictTrackers.Add(0, True)
    For i = 1 To regexToEval.Length
        Dim iChar As String = regexToEval.Substring(i - 1, 1)
        If curChar <> "\" AndAlso iChar = ")" Then EndGroup()
        AddStrToTrackers(iChar)
        If curChar = "\" OrElse iChar <> "(" OrElse regexToEval.Length < i + 2 Then curChar = iChar : Continue For
        If regexToEval.Substring(i, 1) = "?" Then
            i += 1 : AddStrToTrackers("?")
            If regexToEval.Substring(i, 1) = ":" Then i += 1 : AddStrToTrackers(":") : curChar = ":" : Continue For
            Dim NameLength As Integer = 0
            If regexToEval.Substring(i, 1) = "<" Or regexToEval.Substring(i, 1) = "'" Then
                i += 1 : AddStrToTrackers(regexToEval.Substring(i - 1, 1))
                i += 1
                For x = i To regexToEval.Length
                    If regexToEval.Substring(x - 1, 1) = ">" Or regexToEval.Substring(x - 1, 1) = "'" Then
                        NameLength = x - i
                        Exit For
                    End If
                Next
            Else
                CommandGroup = True
                Continue For
            End If
            If NameLength > 0 Then
                Dim GroupName As String = regexToEval.Substring(i - 1, NameLength)
                i += NameLength : curChar = regexToEval.Substring(i - 1, 1) : AddStrToTrackers(GroupName & curChar)
                intGroups += 1
                DictGroups.Add(intGroups, "(" & DictGroups.Count & ")<" & GroupName & ">: " & vbTab)
                DictTrackers.Add(intGroups, True)
                Continue For
            End If
        End If
        curChar = iChar
        intGroups += 1
        DictGroups.Add(intGroups, "(" & DictGroups.Count & ")<" & intGroups.ToString & ">: " & vbTab)
        DictTrackers.Add(intGroups, True)
    Next
    Dim Output As String = MakeOutput()
End Sub

Private Function MakeOutput() As String
    Dim retString As String = String.Empty
    For i = 0 To DictGroups.Count - 1
        retString &= DictGroups(i) & vbCrLf
    Next
    Return retString
End Function

Public Sub EndGroup()
    If CommandGroup Then
        CommandGroup = False
        Exit Sub
    End If
    Dim HighestNum As Integer = 0
    For Each item In DictTrackers
        If Not item.Value Then Continue For
        If item.Key > HighestNum Then HighestNum = item.Key
    Next
    If HighestNum <> 0 Then DictTrackers(HighestNum) = False
End Sub

Public Sub AddStrToTrackers(ByVal addString As String)
    For Each item In DictTrackers
        If item.Value Then DictGroups(item.Key) &= addString
    Next
End Sub
End Module

The only difference is that I'm not capturing either Non-Capture groups, nor function groups. Of course, this is just quick code I made in like 10 minutes. But it's a start if you want it. I use the OrderedDictionary as Keys for Group-Numbers. You could change that structure if you wanted to also include non-capture groups and function groups in the output.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much. I was basically looking for some in-built .NET libraries. Need some modification, but your code works fine in this regard.
0

There is a RegexParser class (internal) in the System.Text.RegularExpressions namespace which you can call using Private Reflection. I have a sample implementation I've using in my FxCopContrib project so far.

There's the RegexParser implementation from the Mono project which you might be able to leverage.

Then there's Deveel's Regex library.

2 Comments

From OP's question: I am looking for a Non-RegEx solution
These are Non-Regex solutions. The proposal is to use the parser used by the platform to parse the Regular Expressions and that gives you an object-graph of all the elements that make up the expression. Which seems to be exactly what the OP is looking for.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.