0

A simple question. I am trying to write a procedure to parse the HTML of this Site

A part of the source code (lines 154 to 174) that is sufficient for a paradigm is:

<p>(British Aircraft Company)</p>
<ul>
<li><a href="/wiki/B.A.C._I" title="B.A.C. I" class="mw-redirect">B.A.C. I</a></li>
<li><a href="/wiki/B.A.C._II" title="B.A.C. II" class="mw-redirect">B.A.C. II</a></li>
<li><a href="/wiki/B.A.C._III" title="B.A.C. III" class="mw-redirect">B.A.C. III</a></li>
<li><a href="/wiki/B.A.C._IV" title="B.A.C. IV" class="mw-redirect">B.A.C. IV</a></li>
<li><a href="/wiki/B.A.C._V" title="B.A.C. V" class="mw-redirect">B.A.C. V</a></li>
<li><a href="/wiki/B.A.C._VI" title="B.A.C. VI" class="mw-redirect">B.A.C. VI</a></li>
<li><a href="/wiki/B.A.C._VII" title="B.A.C. VII" class="mw-redirect">B.A.C. VII</a></li>
<li><a href="/wiki/B.A.C._VII_Mk.2" title="B.A.C. VII Mk.2" class="mw-redirect">B.A.C. VII Mk.2</a></li>
<li><a href="/wiki/B.A.C._VII_Planette" title="B.A.C. VII Planette" class="mw-redirect">B.A.C. VII Planette</a></li>
<li><a href="/wiki/B.A.C._VIII" title="B.A.C. VIII" class="mw-redirect">B.A.C. VIII</a></li>
<li><a href="/wiki/B.A.C._VIII_Bat-Boat" title="B.A.C. VIII Bat-Boat" class="mw-redirect">B.A.C. VIII Bat-Boat</a></li>
<li><a href="/wiki/B.A.C._IX" title="B.A.C. IX" class="mw-redirect">B.A.C. IX</a></li>
<li><a href="/wiki/B.A.C._Cupid" title="B.A.C. Cupid" class="mw-redirect">B.A.C. Cupid</a></li>
<li><a href="/wiki/B.A.C._Drone" title="B.A.C. Drone" class="mw-redirect">B.A.C. Drone</a></li>
<li><a href="/wiki/B.A.C._Super_Drone" title="B.A.C. Super Drone" class="mw-redirect">B.A.C. Super Drone</a></li>
<li><a href="/wiki/B.A._Swallow_2" title="B.A. Swallow 2" class="mw-redirect">B.A. Swallow 2</a></li>
<li><a href="/wiki/B.A._Eagle_2" title="B.A. Eagle 2" class="mw-redirect">B.A. Eagle 2</a></li>
<li><a href="/wiki/B.A._Double_Eagle" title="B.A. Double Eagle" class="mw-redirect">B.A. Double Eagle</a></li>
</ul>

I am in the process of trying to engineer something out. So i can get to the <p> HTML Tag but i cannot tap on the list items to loop out what i want because they are further enclosed between the <ul></ul> tags. What would be your next steps?

Sub ICE()

Set Results = IE.document.getElementsByTagName("p")

For Each itm In Results
    If itm.innerHTML = "(British Aircraft Company)" Then




    End If
Next itm

End Sub

For a more concise picture this stage of my study is based on the answer at VBA parsing of href provided by ron

Recomendation by user Doug Glancy

--> It might be helpful to mention the desired results.

What i want is to have the capability to make VBA to 'click' on runtime the href of my preference since it is an actual link. I am studying code from ron on that which is (and can be seen in the previous example):

If itm.outerhtml = "B.A.C. VII" Then
        itm.Click

        Do Until Not IE.Busy And IE.readyState = 4
            DoEvents
        Loop
        Exit For
 End If

...here outerHTML is being used however the nucleus of my effort is the loop and the logical operator


I wrote this piece of code however it does not work

Set Results = IE.document.getElementsByTagName("p")

For Each itm In Results
    If itm.innerHTML = "(British Aircraft Company)" Then
        Set Results2 = IE.document.getElementsByTagName("ul")
        For Each itm2 In Results2
            If itm2.innerHTML = "B.A.C. V" Then
                MsgBox itm2.innerHTML
            End If

        Next itm2
    End If
Next itm
2
  • 1
    It might be helpful to mention your desired results. Commented Feb 18, 2014 at 15:27
  • Adding now the desired results Commented Feb 18, 2014 at 15:27

1 Answer 1

3

This will list out the aircraft under the p tag with British Aircraft Company

Sub GetAircraft()

    Dim xHttp As MSXML2.XMLHTTP
    Dim hDoc As MSHTML.HTMLDocument
    Dim hUls As MSHTML.IHTMLElementCollection
    Dim hUl As MSHTML.HTMLListElement
    Dim hLi As MSHTML.HTMLLIElement

    Set xHttp = New MSXML2.XMLHTTP
    xHttp.Open "GET", "http://en.wikipedia.org/wiki/List_of_aircraft_%28B%29"
    xHttp.send

    Do
        DoEvents
    Loop Until xHttp.readyState = 4

    Set hDoc = New HTMLDocument
    hDoc.body.innerHTML = xHttp.responseText
    Set hUls = hDoc.getElementsByTagName("ul")

    'Go through all the <ul> tags
    For Each hUl In hUls
        'Only if previous tag is something
        If Not hUl.PreviousSibling Is Nothing Then
            'Only if previous tag is <p>
            If TypeName(hUl.PreviousSibling) = "HTMLParaElement" Then
                'Only if previous paragraph is specified text
                If hUl.PreviousSibling.innerText = "(British Aircraft Company)" Then
                    'loop through the <li> and print them out
                    For Each hLi In hUl.Children
                        Debug.Print hLi.innerText
                    Next hLi
                End If
            End If
        End If
    Next hUl

End Sub
Sign up to request clarification or add additional context in comments.

4 Comments

WOW just a question why do you use GET instead of POST? it just struck me.
isn't is safer with POST?
anyways i'll check GET out :P This is a perfect answer i will learn with dilligence!!! THANK YOU SIR
I'm not an HTML expert (not even close), so I can't speak knowledgeably about GET v POST. I use GET by default and only use POST if I have a reason, like potential security issues.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.