1

This is an extension from Regular Expressions in Excel VBA

I have come up with additional matches that I believe are out of scope from my original question. Here is my existing code:

  Sub ImportFromDTD()

  Dim sDTDFile As Variant
  Dim ffile As Long
  Dim sLines() As String
  Dim i As Long
  Dim Reg1 As RegExp
  Dim M1 As MatchCollection
  Dim M As Match
  Dim myRange As Range

  Set Reg1 = New RegExp

  ffile = FreeFile

  sDTDFile = Application.GetOpenFilename("DTD Files,*.XML", , _
  "Browse for file to be imported")

  If sDTDFile = False Then Exit Sub '(user cancelled import file browser)


  Open sDTDFile For Input Access Read As #ffile
    Lines = Split(Input$(LOF(ffile), #ffile), vbNewLine)
  Close #ffile

  Cells(1, 2) = "From DTD"
  J = 2

  For i = 0 To UBound(Lines)

    'Debug.Print "Line"; i; "="; Lines(i)

    With Reg1
        .Pattern = "\<\!ELEMENT\s+(\w+)\s+\((#\w+|(\w+)\+)\)\s+\>"
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
    End With

    If Reg1.Test(Lines(i)) Then
      Set M1 = Reg1.Execute(Lines(i))
      For Each M In M1
        sExtract = M.SubMatches(2)
        If Len(sExtract) = 0 Then sExtract = M.SubMatches(0)
        sExtract = Replace(sExtract, Chr(13), "")
        Cells(J, 2) = sExtract
        J = J + 1
        'Debug.Print sExtract
      Next M
    End If
  Next i

  Set Reg1 = Nothing

  End Sub

Here is an excerpt from my file:

<!ELEMENT ProductType  (#PCDATA) >
<!ELEMENT Invoices  (InvoiceDetails+) >  
<!ELEMENT Deal  (DealNumber,DealType,DealParties) >
<!ELEMENT DealParty  (PartyType,CustomerID,CustomerName,CentralCustomerID?,
           LiabilityPercent,AgentInd,FacilityNo?,PartyReferenceNo?,
           PartyAddlReferenceNo?,PartyEffectiveDate?,FeeRate?,ChargeType?) >
<!ELEMENT Deals  (Deal*) >

currently, I'm matching:

extract ProductType
<!ELEMENT ProductType  (#PCDATA) >
extract InvoiceDetails
<!ELEMENT Invoices  (InvoiceDetails+) >  

I also need to extract the following:

 Extract Deal
 <!ELEMENT Deal  (DealNumber,DealType,DealParties) >

 Extract DealParty the ?,CR are throwing me off
 <!ELEMENT DealParty  (PartyType,CustomerID,CustomerName,CentralCustomerID?,
           LiabilityPercent,AgentInd,FacilityNo?,PartyReferenceNo?,
           PartyAddlReferenceNo?,PartyEffectiveDate?,FeeRate?,ChargeType?) >

 Extract Deal
 <!ELEMENT Deals  (Deal*) >
2
  • @pnuts I think your edits are worse than my original question. Commented Oct 19, 2015 at 15:13
  • This question is nearly exactly the same as my first one but in more detail I got 4 upvotes for that one. I have 2 down votes and a close on this one? In what universe does that make any sense? Commented Oct 19, 2015 at 15:50

1 Answer 1

1

Maybe I am missing something, but (sorry, I don't have VBA at hand now, so this is VBS, you will have to adapt something)

Option Explicit

Dim fileContents    
    fileContents = WScript.CreateObject("Scripting.FileSystemObject").OpenTextFile("input.xml").ReadAll

Dim matches    
    With New RegExp
        .Multiline = True 
        .IgnoreCase = False
        .Global = True
        .Pattern = "<!ELEMENT\s+([^\s>]+)\s+([^>]*)\s*>"
        Set matches = .Execute( fileContents )
    End With

Dim match
    For Each match in matches
        WScript.Echo match.Submatches(0)
        WScript.Echo match.Submatches(1)
        WScript.Echo "---------------------------------------"
    Next 

As I see it, your main problem is trying to match a multiline regular expression against a separate set of lines one line at a time instead of matching it against the full text.

Sign up to request clarification or add additional context in comments.

1 Comment

This was perfect. Thank you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.