3

I am trying to scrape specific data from website with CSS selectors. I succeeded with the help of QHar but the requirements now have changed. This is my code below:

Code

Public Sub CompanyData2()

Dim html As HTMLDocument, ws As Worksheet, re As Object

Set re = CreateObject("VBScript.RegExp")
re.Pattern = "\s{2,}"
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set html = New HTMLDocument

With CreateObject("MSXML2.XMLHTTP")
    .Open "GET", "https://www.bizi.si/iskanje?q=", False
    .send
    html.body.innerHTML = .responseText
End With

ws.Range("A4").Value = re.Replace(Join$(Array(html.querySelector("td.item a").innerText), ", "), Chr$(32))
ws.Range("A5").Value = re.Replace(Join$(Array(html.querySelector("td.item + td.item").innerText), ", "), Chr$(32))
ws.Range("B6").Value = re.Replace(Join$(Array(html.querySelector("td.item + td.item + td.item + td.item").innerText), ", "), Chr$(32))

End Sub

The result is as follows:

enter image description here

Website

enter image description here

I want to extract name of company on sheet 1 A3 like that:

enter image description here

Thank you.

5
  • please use the snippet tool via edit to share html we can use for testing Commented Nov 17, 2019 at 12:47
  • What is the value you are passing to end of url to get the end output as shown? Can we have at least two example inputs with expected outputs because it looks like html may vary in my testing, Commented Nov 17, 2019 at 13:16
  • I can find and return REPROMAT d.o.o. but not the full name as you show it. Commented Nov 17, 2019 at 13:23
  • ok. I worked out what is actually going on Commented Nov 18, 2019 at 14:12
  • please try the edited answer. Commented Nov 18, 2019 at 16:50

1 Answer 1

1

You need REPROMAT in A1 then after issuing initial query you have to visit the actual company page to get the company name as you show it. If you are using the company url direct then you can skip the first request and use the code from the second request onwards.

Public Sub CompanyData()
    Dim html As HTMLDocument, ws As Worksheet, nodes As Object

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.bizi.si/iskanje?q=" & Application.EncodeURL(ws.Range("A1").Value), False
        .send
        html.body.innerHTML = .responseText

        Set nodes = html.querySelectorAll("td.item")

        With ws
            .Range("A4").Value = nodes.Item(0).FirstChild.innerText
            .Range("A5").Value = nodes.Item(1).innerText
            .Range("A6").Value = "DŠ: " & nodes.Item(3).innerText
        End With

        .Open "GET", html.querySelector("[id$=linkCompany]").href, False
        .send
        html.body.innerHTML = .responseText
        ws.Range("A3") = html.querySelector("#ctl00_ctl00_cphMain_cphMainCol_CompanySPLPreview1_labTitlePRS").innerText
    End With
End Sub
Sign up to request clarification or add additional context in comments.

8 Comments

I have another question for QHarr. It is working fine now, but when I change company in web site BIZI I get in excel the same result of a previous search. I must close and open excel to extract different company data. Thank you.
Hi, do you mind opening a new question and explaining the problem and what you have tried? You can drop a link to it here if you wish.
You can try adding .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT" after the .Open line and see if that changes things first.
stackoverflow.com/questions/59009092/… I am sending you link. Thank you.
Did you try using the requestheader I mentioned?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.