Selenium error while trying to extract data from a web table

Question

This is Selenium with Python. These first lines work fine:

from selenium import webdriver
    browser = webdriver.Firefox()
    browser.get('http://www.palottery.state.pa.us/Games/Past-Winning-Numbers.aspx?id=8')
    elm = browser.find_element_by_xpath(".//*[@id='p_lt_zoneMain_pageplaceholder1_p_lt_zoneContent_pageplaceholder_p_lt_zoneContent_PaLotteryPastWinningNumbers_Button1']")
    elm.click()
    elm2 = browser.find_element_by_xpath(".//*[@id='page-content']/div[2]/div/a/img")
    elm2.click()
    browser.implicitly_wait(10)

Here I get the error

    Dtable = browser.find_element_by_xpath('.//*[@id="p_lt_zoneLeft_PaLotteryPastWinningNumbers_Results"]/tbody')

    for i in Dtable.find_elements_by_xpath('.//tr'):
        print(i.get_attribute('innerHTML'))

elenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: {"method":"xpath","selector":".//*[@id=\"p_lt_zoneLeft_PaLotteryPastWinningNumbers_Results\"]/tbody"}

UPDATE: I still can't get all the 250 rows of the table. I am getting only 10 rows for some reason...

def getWinNums():

    l = []

    from selenium import webdriver
    browser = webdriver.Firefox()
    browser.get('http://www.palottery.state.pa.us/Games/Past-Winning-Numbers.aspx?id=8')

    elm = browser.find_element_by_xpath(".//*[@id='p_lt_zoneMain_pageplaceholder1_p_lt_zoneContent_pageplaceholder_p_lt_zoneContent_PaLotteryPastWinningNumbers_Button1']")
    elm.click()
    elm2 = browser.find_element_by_xpath(".//*[@id='page-content']/div[2]/div/a/img")
    elm2.click()
    browser.implicitly_wait(10)

    Dtable = browser.find_element_by_xpath(".//*[@id='page-content']//table/tbody")


        # create list were elements are dates followed by 5 numbers for that date
    l = [i.text.strip() for i in Dtable.find_elements_by_xpath('.//td') if i.text != "Payout"]

    browser.close()

    # convert list into list of tuples (date, 5 numbers)
    l =  zip(*[iter(l)]*2)

    return l


def main():

    l = getWinNums()

    for el in l:
        print(el)


if __name__ == "__main__":
        main()

OUTPUT:

('09/08/2015', '2   32   35   36   39')

('09/07/2015', '14   17   19   24   43')

('09/06/2015', '10   13   15   36   38')

('09/05/2015', '4   5   24   29   34')

('09/04/2015', '1   12   18   34   36')

('09/03/2015', '4   9   15   28   40')

('09/02/2015', '14   16   17   18   34')

('09/01/2015', '7   26   33   36   41')

('08/31/2015', '17   20   22   32   41')

('08/30/2015', '11   14   23   24   38')

UPDATE #2

CSS selector works like shown below, but again Dtable.find_elements_by_xpath('.//td') produces only 10 rows out of 251.

Dtable = browser.find_element_by_css_selector("table>tbody")

UPDATE #3

Now I can get 50 rows of the table with this:

for i in range(1,6):

    link3 = browser.find_element_by_xpath(".//*[@id='p_lt_zoneMain_pageplaceholder1_p_lt_zoneContent_pageplaceholder_p_lt_zoneContent_PaLotteryPastWinningNumbers_Results_paginate']/span/a[{i}]".format(i=i))

    link3.click()

    Dtable = browser.find_element_by_css_selector("table>tbody>tr")

    l = [i.text.strip() for i in Dtable.find_elements_by_xpath('//td') if i.text != "Payout"]

    l_result += l

The remaining problem is how to get to the next 50 rows by clicking on pagination button. I can get the xpath for the button, it is:

.//*[@id='p_lt_zoneMain_pageplaceholder1_p_lt_zoneContent_pageplaceholder_p_lt_zoneContent_PaLotteryPastWinningNumbers_Results_next']

but clicking on it and repeating the above for loop does not produce any new rows from the table.

Saifur · Accepted Answer · 2015-09-09 00:21:34Z

2

I guess you want to change the selector to fetch the table shown as follows:

 Dtable = browser.find_element_by_xpath('.//*[@id="p_lt_zoneLeft_PaLotteryPastWinningNumbers_Results"]/tbody')

to:

 Dtable = browser.find_element_by_css_selector("table[id^='p_lt_zoneLeft']")

edited Sep 9, 2015 at 0:21

answered Sep 8, 2015 at 23:35

Saifur

16.3k7 gold badges51 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

LetzerWille Over a year ago

thanks! No more errors. I have added /td into the loop and now get dates, but the numbers for each date are not extracted cleanly. They are interspersed with   Is there a way to extract the numbers with a selenium statement ?

Saifur Over a year ago

just use i.text instead of attribute

LetzerWille Over a year ago

i.text did it! thanks again. But the table has 250 rows: /tbody/tr[1]/td[2] - /tbody/tr[250]/td[2] , but the script prints only 8. Is it because of @id='page-content' ?

LetzerWille Over a year ago

Unfortunately, with the last change i get this error: Unable to locate element: {"method":"css selector","selector":"table[id^='p_lt_zoneLeft']"

Collectives™ on Stack Overflow

Selenium error while trying to extract data from a web table

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related