5

I'm using the Microsoft regular expression engine in Excel VBA. I'm very new to regex but I have a pattern working right now. I need to expand it and I'm having trouble. Here is my code so far:

Sub ImportFromDTD()

Dim sDTDFile As Variant
Dim ffile As Long
Dim sLines() As String
Dim i As Long
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim myRange As Range

Set Reg1 = New RegExp

ffile = FreeFile

sDTDFile = Application.GetOpenFilename("DTD Files,*.XML", , _
"Browse for file to be imported")

If sDTDFile = False Then Exit Sub '(user cancelled import file browser)


Open sDTDFile For Input Access Read As #ffile
  Lines = Split(Input$(LOF(ffile), #ffile), vbNewLine)
Close #ffile

Cells(1, 2) = "From DTD"
J = 2

For i = 0 To UBound(Lines)

  'Debug.Print "Line"; i; "="; Lines(i)

  With Reg1
      '.Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)"
      .Pattern = "(\<\!ELEMENT\s)(\w*)(\s*\(\#\w*\)\s*\>)"

      .Global = True
      .MultiLine = True
      .IgnoreCase = False
  End With

  If Reg1.Test(Lines(i)) Then
    Set M1 = Reg1.Execute(Lines(i))
    For Each M In M1
      sExtract = M.SubMatches(1)
      sExtract = Replace(sExtract, Chr(13), "")
      Cells(J, 2) = sExtract
      J = J + 1
      'Debug.Print sExtract
    Next M
  End If
Next i

Set Reg1 = Nothing

End Sub

Currently, I'm matching on a set of data like this:

 <!ELEMENT DealNumber  (#PCDATA) >

and extract Dealnumber but now, I need to add another match on data like this:

<!ELEMENT DealParties  (DealParty+) >

and extract just Dealparty without the Parens and the +

I've been using this as a reference and it's awesome but I'm still a bit confused. How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

EDIT

I have come across a few new scenarios that have to be matched on.

 Extract Deal
 <!ELEMENT Deal  (DealNumber,DealType,DealParties) >

 Extract DealParty the ?,CR are throwing me off
 <!ELEMENT DealParty  (PartyType,CustomerID,CustomerName,CentralCustomerID?,
           LiabilityPercent,AgentInd,FacilityNo?,PartyReferenceNo?,
           PartyAddlReferenceNo?,PartyEffectiveDate?,FeeRate?,ChargeType?) >

 Extract Deals
 <!ELEMENT Deals  (Deal*) >

2 Answers 2

3

Looking at your pattern, you have too many capture groups. You only want to capture the PCDATA and DealParty. Try changing you pattern to this:

  With Reg1
      .Pattern = "\<!ELEMENT\s+\w+\s+\(\W*(\w+)\W*\)"

      .Global = True
      .MultiLine = True
      .IgnoreCase = False
  End With

Here's the stub: Regex101.

Sign up to request clarification or add additional context in comments.

4 Comments

It didn't work. When I run it, It stops on sExtract = M.SubMatches(1) and if I hold the cursor over it, I get <invalid procedure call or argument> and .pattern is <object variable or with block variable not set> the only thing I changed is the pattern.
I just realized you thought I was tring to extract PCDATA which is not the case. I've updated my question.
Your edited question is still confusing to me. You have lines like <!ELEMENT x (y)>. What do you want to get? x, y or both?
I want to get x if there isn't a + if there is, I want to get y without the +
1

You could use this Regex pattern;

  .Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
  1. This portion

(#\w+|(\w+)\+)

says match either

#a-z0-9
a-z0-9+

inside the parentheses.

ie match either

(#PCDATA)
(DealParty+)

to validate the entire string

  1. Then the submatches are used to extract DealNumber for the first valid match, DealParty for the other valid match

edited code below - note submatch is now M.submatches(0)

    Sub ImportFromDTD()

Dim sDTDFile As Variant
Dim ffile As Long
Dim sLines() As String
Dim i As Long
Dim Reg1 As RegExp
Dim M1 As MatchCollection
Dim M As Match
Dim myRange As Range

Set Reg1 = New RegExp
J = 1

strIn = "<!ELEMENT Deal12Number  (#PCDATA) > <!ELEMENT DealParties  (DealParty+) >"

With Reg1
      .Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
      .Global = True
      .MultiLine = True
      .IgnoreCase = False
End With

If Reg1.Test(strIn) Then
    Set M1 = Reg1.Execute(strIn)
    For Each M In M1
      sExtract = M.SubMatches(2)
      If Len(sExtract) = 0 Then sExtract = M.SubMatches(0)
      sExtract = Replace(sExtract, Chr(13), "")
      Cells(J, 2) = sExtract
      J = J + 1
    Next M
End If

Set Reg1 = Nothing

End Sub

3 Comments

Thank for your post, it was the exact answer for my question. Since then, I've come across a few more matches that I need. One of them being a multiline one and I'm having trouble getting the pattern to match. I've been on the regex101 site all day working on it. I've edited my original post to include them. I'm thinking I might not be able to do them all in one pattern.
I'm just going to ask a new question for the additional matches. Thanks for your help!
Hi Brett. Can you take a look at my new post for me? It got buried very fast and was downgraded for what I feel were very ludicrous reasons. Second try

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.