1

I'm trying to scrape the whole table from: https://free-proxy-list.net/

And I managed to scrape it but it resulted in only the first row of the table instead of 20 rows.

I saw previous similar questions that were answered and I have tried the solutions given but my selenium was unable to locate the element when I use .// for my xpath.

for bod in driver.find_elements_by_xpath("//*[@id='proxylisttable']/tbody"):
    col = bod.find_elements_by_xpath("//*[@id='proxylisttable']/tbody/tr")
    for c in col:
        ip = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr/td[1]')
        port = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr/td[2]')
        code = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr/td[3]')
        country = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr/td[4][@class = "hm"]')
        anonymity = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr/td[5]')
        google = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr/td[6][@class = "hm"]')

My code resulted in only scraping the first row 20 times instead of getting 20 rows. The rows are indicated at ip, port, code, etc. I have tried multiple types of xpath syntax but still end up the same.

1
  • You want get value each cell ? Commented Aug 9, 2019 at 4:42

3 Answers 3

2

I think your problem is in this line :

col = bod.find_elements_by_xpath("//*[@id='proxylisttable']/tbody/tr")

The correct syntax is :

col = bod.find_elements_by_xpath("//*[@id='proxylisttable']/tbody/tr[insert count here]")

Like this :

table = driver.find_element_by_xpath("//*[@id='proxylisttable']/tbody")
rows = table.find_elements_by_xpath("//*[@id='proxylisttable']/tbody/tr")

for i in range (1, len(rows)+1):
    row = table.find_element_by_xpath("//*[@id='proxylisttable']/tbody/tr[" +str(i) +']')
    for c in row:
        ip = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr/td[1]')
        port = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr/td[2]')
        code = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr/td[3]')
        country = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr/td[4][@class = "hm"]')
        anonymity = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr/td[5]')
        google = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr/td[6][@class = "hm"]')
Sign up to request clarification or add additional context in comments.

3 Comments

I tried this method and unfortunately now it only returns the first row as well. It did not increase the 'i' at the end.
sure the i not increase ?
yea :( unfortunately it did not. I was trying to insert the i somewhere else in the code as well. but same results. Also I think it has something to do with the tr[' + i + ']'. I think the i there needs to be str(i)
1

To handle dynamic element induce WebdriverWait and visibility_of_all_elements_located to wait for the element and then use following xpath.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver=webdriver.Chrome("path of the chrome driver")
driver.get('https://free-proxy-list.net/')

rows= WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='proxylisttable']/tbody//tr")))
for row in rows:
    ip=row.find_element_by_xpath('./td[1]').text
    port=row.find_element_by_xpath('./td[2]').text
    code=row.find_element_by_xpath('./td[3]').text
    country=row.find_element_by_xpath('./td[4]').get_attribute('textContent')
    Anonymity=row.find_element_by_xpath('./td[5]').text
    google=row.find_element_by_xpath('./td[6]').get_attribute('textContent')
    https=row.find_element_by_xpath('./td[7]').text
    lastchecked=row.find_element_by_xpath('./td[8]').get_attribute('textContent')
    print("IP :{}, Port:{}, code:{}, country:{}, Anonymity:{}, google:{}, https:{}, last_checked:{}".format(ip,port,code,country,Anonymity,google,https,lastchecked))

Output on console:

IP :185.132.133.173, Port:8080, code:NL, country:Netherlands, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :181.112.225.78, Port:58948, code:EC, country:Ecuador, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :134.249.149.219, Port:35795, code:UA, country:Ukraine, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :195.20.30.54, Port:55182, code:UA, country:Ukraine, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :14.102.69.170, Port:53347, code:IN, country:India, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :182.53.193.108, Port:54543, code:TH, country:Thailand, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :159.224.221.175, Port:58299, code:UA, country:Ukraine, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :36.89.188.123, Port:49725, code:ID, country:Indonesia, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :103.231.163.58, Port:43620, code:BD, country:Bangladesh, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :114.130.92.14, Port:49167, code:BD, country:Bangladesh, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :177.54.200.10, Port:49501, code:BR, country:Brazil, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :77.38.21.239, Port:8080, code:SI, country:Slovenia, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :78.137.89.161, Port:8080, code:YE, country:Yemen, Anonymity:transparent, google:no, https:no, last_checked:1 minute ago
IP :103.216.147.49, Port:8080, code:IN, country:India, Anonymity:transparent, google:no, https:no, last_checked:1 minute ago
IP :195.250.188.210, Port:8080, code:EE, country:Estonia, Anonymity:transparent, google:no, https:no, last_checked:1 minute ago
IP :5.196.255.171, Port:3128, code:FR, country:France, Anonymity:transparent, google:no, https:no, last_checked:1 minute ago
IP :109.234.112.250, Port:46675, code:GE, country:Georgia, Anonymity:transparent, google:no, https:no, last_checked:1 minute ago
IP :186.225.48.178, Port:8080, code:BR, country:Brazil, Anonymity:transparent, google:no, https:no, last_checked:1 minute ago
IP :101.255.64.142, Port:35401, code:ID, country:Indonesia, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago
IP :160.119.129.42, Port:57557, code:GN, country:Guinea, Anonymity:elite proxy, google:no, https:yes, last_checked:1 minute ago

1 Comment

I finally did it! Thank you!
0

Modify your code to have one index inside your second for loop which will loop from 1 to length of cols and use it to find each column element

ip = c.find_element_by_xpath('//*[@id="proxylisttable"]/tbody/tr[index]/td[1]')

P.S.: Please modify the syntax as per python.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.