0

I'm trying to extract information based on keywords from a website, which is in tabular format. Example here for keywords and sample website is

apple
LG
Samsung
Nokia

Website = www.wikipedia.com

I have kept the keywords in Sheet 1 and the table to be extracted in Sheet 2. I have an issue with going to next line as the table dimension is not same for all. I have attached a screenshot too.

Sample-Data Function_Error

This is my code so far, this is sample data only, i have huge keywords to be searched for in my actual work.

Sub tableextract()
Dim bot As WebDriver
Set bot = New WebDriver
bot.Start "Chrome"
bot.Get "https://www.wikipedia.org/"

r = 1
While (Len(Range("A" & r)) > 0)

bot.FindElementById("searchInput").SendKeys Range("A" & r)
bot.FindElementByXPath("//i[@class='sprite svg-search-icon']").Click

Dim tbl As Selenium.TableElement
Set tbl = bot.FindElementByXPath("//table[@class='infobox biota']").AsTable

ThisWorkbook.Sheets("Sheet2").Range("A1").CurrentRegion.Clear   'THIS IS OPTIONAL
tbl.ToExcel ThisWorkbook.Sheets("Sheet2").Range("A1")

Dim LastRow As Long
On Error Resume Next
LastRow = Cells.Find(What:="*", _
                After:=Range("A1"), _
                LookAt:=xlPart, _
                LookIn:=xlFormulas, _
                SearchOrder:=xlByRows, _
                SearchDirection:=xlPrevious, _
                MatchCase:=False).Row
On Error GoTo 0
'MsgBox "Last Row: " & LastRow
Range(LastRow).Offset(1, 0).Select

r = r + 1
Wend
End Sub
5
  • Not all the items of search has the same xpath //table[@class='infobox biota Commented Jul 4, 2020 at 12:16
  • @QHarr i Understood your concept here, after i ran your code, it says, " Varaible not defined" and highlights the function. i have added the screenshot too. Commented Jul 4, 2020 at 13:17
  • @YasserKhalil Yes sir , but the main objective is not that. I can search for 2 sample keywords let's say, Apple and watermelon , and then there is same XPath. Commented Jul 4, 2020 at 13:20
  • @QHarr Hey after removing "Option Explicit" is worked. But the problem is that function is not working , because i ran the code and it runs for sample (Apple & watermelon) , but the watermelon table overwrites the apple's data. Commented Jul 4, 2020 at 13:51
  • @QHarr Yes, it worked perfectly now. Little changes here and there and it finally worked , thanks a lot ! Commented Jul 4, 2020 at 16:37

1 Answer 1

1

Based on my tutor's solution (QHarr) I have modified a little so as to directly get the page without clicking on the search button

Sub Extract_Tables_Wikipedia()
    Dim wsSrc As Worksheet, wsDest As Worksheet, bot As WebDriver, tbl As Selenium.TableElement, r As Long, m As Long
    Set bot = New WebDriver
    bot.Start "Chrome"
    Set wsSrc = ThisWorkbook.Worksheets("Sheet1")
    Set wsDest = ThisWorkbook.Worksheets("Sheet2")
    r = 1
    With bot
        While (Len(wsSrc.Range("A" & r)) > 0)
            .Get "https://en.wikipedia.org/wiki/" & wsSrc.Range("A" & r)
            Set tbl = .FindElementByXPath("//table[@class='infobox biota']").AsTable
            m = GetLastRow(wsDest)
            tbl.ToExcel wsDest.Range("A" & IIf(m = 1, 1, m + 1))
            r = r + 1
        Wend
    End With
End Sub

Function GetLastRow(ByVal ws As Worksheet) As Long
    On Error Resume Next
        GetLastRow = ws.Cells.Find(What:="*", After:=ws.Range("A1"), LookAt:=xlPart, LookIn:=xlFormulas, SearchOrder:=xlByRows, SearchDirection:=xlPrevious, MatchCase:=False).Row
    On Error GoTo 0
End Function
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for more cleaner code. Also i'm not able to find Qharr's comments , Strange !
In fact, QHarr deserves thanks most. I have learned a lot from him.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.