4

Consider this extract of an html page:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
<div class="BoxBody">
<span class="txt">20 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] &nbsp;1&nbsp;, <a class="page" href="javascript:paginacao('paginar','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar','last');">Last</a>]</span></p>
<br>
<span class="txt">25 Records found. </span>
<p style="text-align: right;"><span class="txt">[First/Previous] &nbsp;1&nbsp;, <a class="page" href="javascript:paginacao('paginar2','2');" title="Go to page 2">2</a> [<a class="page" title="Next page" href="javascript:paginacao('paginar2','next');">Next</a>/<a class="page" title="Last page" href="javascript:paginacao('paginar2','last');">Last</a>]</span></p>
</div>
</body>
</html>

I am trying to get the anchor tag that has the "next" page href (if it has one).

I tried this in the console using Firefox and it works:

document.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")

I put up a sample VBA code using querySelector as well, but it fails with Invalid argument.

Sub test()

Dim oFSO As Object, paginator As Object
Dim oFS As Object, sText As String

Set oFSO = CreateObject("Scripting.FileSystemObject")
Set oFS = oFSO.OpenTextFile(ThisWorkbook.Path & "\example.html")

Do Until oFS.AtEndOfStream
    sText = oFS.ReadAll()
Loop


Dim html As HTMLDocument, html2 As Object
Set html = New HTMLDocument
Set html2 = html
html2.Write sText

Set paginator = html.querySelector(".BoxBody > p:nth-child(2) > span:nth-child(1)").querySelector("a[title='Next page']")

End Sub

What is causing this? The p:nth-child(2) identifier? How should I go to extract that element using VBA?

1 Answer 1

4

nth-child(2) is not supported in VBA and is indeed causing the error message. You can't use :nth-child() or :nth-of-type(). There is very little implemented in libraries available to you that deal with pseudo-classes. You can use first-child interestingly. You will also find you are limited on which objects you can chain querySelector on.

Dim ele As Object, iText As String
Set ele = html.querySelector(".BoxBody > p > span:first-child > a[title='Next page']")
   
On Error Resume Next
iText = ele.href
On Error GoTo 0

If iText = vbNullString Then '<== This assumes that the href has a value otherwise use an On Error GoTo which will then handle the error and print "no href"
    Debug.Print "No href"
Else
   Debug.Print "href"
End If

EDIT: 29/5/21 As of some point in last month (?) it has become possible to use element.querySelector widely as well as the most of the standard pseudo-class selectors (at least for Windows 10, MSHTML.DLL 11.00.19041.985 (Date modified 12/5/21)

Sign up to request clarification or add additional context in comments.

7 Comments

That was my first solution, but since there are two similar paginated tables in the page (with that same title attribute), I really need to check if that element exists inside that .BoxBody > p:nth-child(2) > span:nth-child(1) span:nth-child(1) element..
Ok. If there is enough to demonstrate the choice that must be made .
No, I want one match only (whether the 'next' button has an href, or not)
Please check the edited html. I only want to check whether the first a with title Next page has an href or not...And I cannot use querySelectorAll as it is constantly crashing Excel...
The first a tag with title Next page
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.