0

The problem at hand is to update the InnerText of P-elements in HTML files based on matching values in Excel range D6:D500 and replacing them with corresponding values in range E6:E500. The code uses MSXML2 and MSHTML libraries to parse the HTML files and replace the values.

The goal is to automate the process of updating the InnerText values in the HTML files based on the matching values in the Excel range.

The expected result is that the code should be able to successfully parse the HTML files, identify all P-elements, compare their InnerText values with values in range D6:D500, and replace them with corresponding values in range E6:E500. The final result should be a modified version of the HTML files with updated InnerText values.

WHAT I HAVE SO FAR
Currently it only prints the innerText. I'd like it to replace the substrings in the innerText as described above:

Sub ReplaceValuesInHTMLFiles()

    Dim htmlFile As Variant
    Dim htmlFilePath As String
    Dim htmlContent As String
    Dim searchRange As Range
    Dim replaceRange As Range
    Dim searchValue As String
    Dim replaceValue As String

    Set searchRange = ThisWorkbook.ActiveSheet.Range("D6:D500")
    Set replaceRange = ThisWorkbook.ActiveSheet.Range("E6:E500")

    With Application.fileDialog(msoFileDialogFilePicker)
        .Title = "Select HTML files to modify"
        .Filters.Clear
        .Filters.Add "HTML files", "*.html;*.htm", 1
        .AllowMultiSelect = True
        If .Show = -1 Then
            For Each htmlFile In .SelectedItems
                Dim IE As MSXML2.XMLHTTP60
                Set IE = New MSXML2.XMLHTTP60
                IE.Open "GET", htmlFile, False
                IE.send
                While IE.readyState <> 4
                    DoEvents
                Wend
                Dim HTMLdoc As MSHTML.HTMLDocument
                Dim HTMLBody As MSHTML.HTMLBody
                Set HTMLdoc = New MSHTML.HTMLDocument
                Set HTMLBody = HTMLdoc.body
                HTMLBody.innerHTML = IE.responseText
                'GET ALL P-ELEMENTS
                Dim Pelements As IHTMLElementCollection
                Dim Pelement As HTMLTableCell
                Set Pelements = HTMLdoc.getElementsByTagName("P")
                For i = 0 To Pelements.Length - 1

                'UPDATE INNERTEXT
                Debug.Print Pelements(i).innerText

                Next i
            Next
        End If
    End With
    'CLEANUP
    Set IE = Nothing
    Set fd = Nothing
    Set fso = Nothing
    Set HTMLdoc = Nothing
    Set HTMLBody = Nothing
    Set Pelements = Nothing
    MsgBox "Values replaced in HTML files.", vbInformation, "Complete"
End Sub

the data in the excel table is as follows (extract):

COLUMN D COLUMN E
John <a class="link-ch" href="test.html#chJohn">John</a>
Andrew <a class="link-ch" href="test.html#chAndrew">Andrew</a>
3
  • Where does the code return the problem? What problem does it return? What result does it return if any? Can you publish an explicative example of the data you are working with and an expected result? Commented May 3, 2023 at 8:17
  • @EvilBlueMonkey, I updated my question to include your questions. an example of the data in Excel is added. I does not return an error. it currently on prints the innerText but I am looking for a way to replace the substrings in the innertext with the data from the excel table Commented May 3, 2023 at 8:30
  • And how does the found InnerText look? Can you share somehow (maybe a transfer site) such a html document? Or, a dummy document with some such html tags shown in plain text in your edited question... I would like to make some tests,.. Commented May 3, 2023 at 9:33

1 Answer 1

1

Since i don't have the whole picture, i've taken your code and edited accordingly to the instructions given and the informations granted. Here it is:

Sub ReplaceValuesInHTMLFiles()
    
    Dim htmlFile As Variant
    Dim htmlFilePath As String
    Dim htmlContent As String
    Dim searchRange As Range
    Dim replaceRange As Range
    Dim searchValue As String
    Dim replaceValue As String
    
    Set searchRange = ThisWorkbook.ActiveSheet.Range("D6:D500")
    Set replaceRange = ThisWorkbook.ActiveSheet.Range("E6:E500")
    
    With Application.FileDialog(msoFileDialogFilePicker)
        .Title = "Select HTML files to modify"
        .Filters.Clear
        .Filters.Add "HTML files", "*.html;*.htm", 1
        .AllowMultiSelect = True
        If .Show = -1 Then
            For Each htmlFile In .SelectedItems
                Dim IE As MSXML2.XMLHTTP60
                Set IE = New MSXML2.XMLHTTP60
                IE.Open "GET", htmlFile, False
                IE.send
                While IE.readyState <> 4
                    DoEvents
                Wend
                Dim HTMLdoc As MSHTML.HTMLDocument
                Dim HTMLBody As MSHTML.HTMLBody
                Set HTMLdoc = New MSHTML.HTMLDocument
                Set HTMLBody = HTMLdoc.body
                HTMLBody.innerHTML = IE.responseText
                'GET ALL P-ELEMENTS
                Dim Pelements As IHTMLElementCollection
                Dim Pelement As HTMLTableCell
                Set Pelements = HTMLdoc.getElementsByTagName("P")
                
                
                For i = 0 To Pelements.Length - 1
                
                    'UPDATE INNERTEXT
                    Debug.Print Pelements(i).innerText
                    
                Next i
                
                
                'XXXXXXXXXXXX
                'EDIT - Start
                'XXXXXXXXXXXX
                
                'Declarations.
                Dim StrOriginal As String
                Dim StrChomp As String
                Dim StrExtra As String
                Dim StrReplacement As String
                Dim DblRow As Double
                Dim DblIndex As Double
                Dim StrStartMarker As String
                Dim StrEndMarker As String
                
                'Settings.
                StrStartMarker = "<p>"
                StrEndMarker = "</p>"
                
                'Retrieving the data.
                Open htmlFile For Binary As #1
                StrOriginal = Space$(LOF(1))
                Get #1, , StrOriginal
                Close #1
                
                'Covering each section of the data as delimited by StrStartMarker.
                For DblIndex = UBound(Split(StrOriginal, StrStartMarker)) To 0 Step -1
                    
                    'Settings the string as the section delimited by StrStartMarker.
                    StrChomp = Split(StrOriginal, StrStartMarker)(DblIndex)
                    StrExtra = StrChomp
                    
                    'Chechink if it's the first section.
                    If DblIndex = 0 Then
                        
                        'Setting StrReplacement.
                        StrReplacement = StrChomp & StrReplacement
                        
                    Else
                        
                        'Setting StrChomp as the first part of the section delimited by StrEndMarker.
                        StrChomp = Split(StrChomp, StrEndMarker)(0)
                        
                        'Setting StrChomp as the second part of the section delimited by StrEndMarker.
                        StrExtra = Split(StrExtra, StrEndMarker)(1)
                        
                        'Covering each row of searchRange.
                        For DblRow = 1 To searchRange.Rows.Count
                            
                            'If the given cell is empty, the For-Next loop is terminated.
                            If searchRange(DblRow, 1).Value2 = "" Then Exit For
                            
                            'Substitutions.
                            StrChomp = Replace(StrChomp, searchRange(DblRow, 1).Value2, replaceRange(DblRow, 1))
                            
                        Next
                        
                        'Setting StrReplacement.
                        StrReplacement = StrStartMarker & StrChomp & StrEndMarker & StrExtra & StrReplacement
                        
                    End If
                    
                Next
                'Re-writing the data.
                Open htmlFile For Output As #1
                Print #1, StrReplacement
                Close #1
                
                'Clearing up.
                StrReplacement = ""
                StrChomp = ""
                StrExtra = ""
                
                
                'XXXXXXXXXXXX
                'EDIT - End
                'XXXXXXXXXXXX
                
                
            Next
        End If
    End With
    'CLEANUP
    Set IE = Nothing
    Set fd = Nothing
    Set FSO = Nothing
    Set HTMLdoc = Nothing
    Set HTMLBody = Nothing
    Set Pelements = Nothing
    MsgBox "Values replaced in HTML files.", vbInformation, "Complete"
End Sub

Part of it is of course not necessary (the whole Pelements part for example), and i can't really be sure you won't encounter any problem with the substitutions since i have a only 2 rows of data and no original HTML text. Still, the code should search for each section of each HTML file delimited by <p> and </p> and within them substitute each string listed in searchRange with those listed in replaceRange.

IMPORTANT NOTE: the code will substitute the actual original files as asked, which i do not really recommend. I'm quite skeptical mostly because you are replacing names with strings that contains those same names. Therefore, were you to accidentally run the macro twice on the same file, you would re-apply the same change to the text messing with the final result probably with no easy way to undo the change.

Useful links (much credit to them, by the way):

  1. A similar question
  2. Save Excel Files as Text (external link)

EDIT

I realized that any p element might contain property. Here a new code that should account for that:

Sub ReplaceValuesInHTMLFiles()
    
    Dim htmlFile As Variant
    Dim htmlFilePath As String
    Dim htmlContent As String
    Dim searchRange As Range
    Dim replaceRange As Range
    Dim searchValue As String
    Dim replaceValue As String
    
    Set searchRange = ThisWorkbook.ActiveSheet.Range("D6:D500")
    Set replaceRange = ThisWorkbook.ActiveSheet.Range("E6:E500")
    
    With Application.FileDialog(msoFileDialogFilePicker)
        .Title = "Select HTML files to modify"
        .Filters.Clear
        .Filters.Add "HTML files", "*.html;*.htm", 1
        .AllowMultiSelect = True
        If .Show = -1 Then
            For Each htmlFile In .SelectedItems
                Dim IE As MSXML2.XMLHTTP60
                Set IE = New MSXML2.XMLHTTP60
                IE.Open "GET", htmlFile, False
                IE.send
                While IE.readyState <> 4
                    DoEvents
                Wend
                Dim HTMLdoc As MSHTML.HTMLDocument
                Dim HTMLBody As MSHTML.HTMLBody
                Set HTMLdoc = New MSHTML.HTMLDocument
                Set HTMLBody = HTMLdoc.body
                HTMLBody.innerHTML = IE.responseText
                'GET ALL P-ELEMENTS
                Dim Pelements As IHTMLElementCollection
                Dim Pelement As HTMLTableCell
                Set Pelements = HTMLdoc.getElementsByTagName("P")
                For i = 0 To Pelements.Length - 1
                
                'UPDATE INNERTEXT
                Debug.Print Pelements(i).innerText
                
                Next i
                
                
                'XXXXXXXXXXXX
                'EDIT - Start
                'XXXXXXXXXXXX
                
                'Declarations.
                Dim StrReplacement As String
                Dim StrSpecialChar() As String
                Dim DblIndex As Double
                Dim StrNodeName As String
                Dim ColElements As IHTMLElementCollection
                
                'Settings.
                StrNodeName = "P"
                Set ColElements = HTMLdoc.all
                
                'Redeclaring and setting StrSpecialChar
                ReDim StrSpecialChar(1 To 2, 1 To 2)
                StrSpecialChar(1, 1) = "&lt;"
                StrSpecialChar(2, 1) = "<"
                StrSpecialChar(1, 2) = "&gt;"
                StrSpecialChar(2, 2) = ">"
                
                
                'Covering each element.
                For DblIndex = 0 To ColElements.Length - 1
                    
                    'Referring to the given element.
                    With ColElements(DblIndex)
                        
                        'Checking if the element is of the desired kind.
                        If .nodeName = StrNodeName Then
                            
                            'Covering each row of searchRange.
                            For DblRow = 1 To searchRange.Rows.Count
                                
                                'If the given cell is empty, the For-Next loop is terminated.
                                If searchRange(DblRow, 1).Value2 = "" Then Exit For
                                
                                'Substitutions.
                                .innerText = Replace(.innerText, searchRange(DblRow, 1).Value2, replaceRange(DblRow, 1))
                                
                            Next
                            
                        End If
                        
                    End With
                    
                Next
                
                'Setting StrReplacement.
                StrReplacement = CStr(ColElements(0).outerHTML)
                
                'Replacing special characters in StrReplacement.
                For DblIndex = 1 To UBound(StrSpecialChar, 2)
                    
                    StrReplacement = Replace(StrReplacement, StrSpecialChar(1, DblIndex), StrSpecialChar(2, DblIndex))
                    
                Next
                
                'Re-writing the data.
                Open htmlFile For Output As #1
                Print #1, StrReplacement
                Close #1
                
                'XXXXXXXXXXXX
                'EDIT - End
                'XXXXXXXXXXXX
                
                
            Next
        End If
    End With
    'CLEANUP
    Set IE = Nothing
    Set fd = Nothing
    Set fso = Nothing
    Set HTMLdoc = Nothing
    Set HTMLBody = Nothing
    Set Pelements = Nothing
    MsgBox "Values replaced in HTML files.", vbInformation, "Complete"
End Sub
Sign up to request clarification or add additional context in comments.

3 Comments

Your first suggestion does not replace the text in the html files with values from the excel file. The second suggestion give a run-time error on If .nodeName = StrNodeName Then
I got the 1st suggestion part working now: I had to change StrStartMarker = "<p>" into StrStartMarker = "<p " Many thanks!
No problem. Be aware that the first code edited that way won't work in case of a simple p element (no property attached).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.