How to get data from javascript rendered table using selenium in python

Question

I have a website to scrape and i am using selenium to do it. When i finished writing the code, i noticed that i was not getting output at all when i print the table contents. I viewed the page source and then i found out that the table was not in the source. That is why even i find the xpath of the table from inspect element i cant get any output of it. Do someone know how could I get the response/data or just printing the table from the javascript response? Thanks.

Here is my current code

from bs4 import BeautifulSoup
from selenium import webdriver
import time
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument('--incognito')
chrome_path = r"C:\chromedriver.exe"
driver = webdriver.Chrome(chrome_path,options=options)

driver.implicitly_wait(3)
url = "https://reversewhois.domaintools.com/?refine#q=
%5B%5B%5B%22whois%22%2C%222%22%2C%22VerifiedID%40SG-Mandatory%22%5D%5D%5D"
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html,'lxml')

#These line of codes is for selecting the desired search parameter from the combo box, you can disregard this since i was putting the whole url with params
input = driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[3]/input')
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[1]/div').click()
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[5]/div[1]/div/div[3]').click()
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[2]/div/div[1]').click()
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[6]/div[1]/div/div[1]').click
input.send_keys("VerifiedID@SG-Mandatory")
driver.find_element_by_xpath('//*[@id="search-button-container"]/button').click()


table = driver.find_elements_by_xpath('//*[@id="refine-preview-content"]/table/tbody/tr/td')
for i in table:
     print(i) no output

I just want to scrape all the domain names like in the first result like 0 _ _ .sg

Prakhar Jhudele · Accepted Answer · 2020-03-28 10:28:37Z

1

You can try the below code. After you have selected all the details options to fill and click on the search button it is kind of an implicit wait to make sure we get the full page source. Then we used the read_html from pandas which searches for any tables present in the html and returns a list of dataframe. we take the required df from there.

from selenium import webdriver
import time
from selenium.webdriver.chrome.options import Options
import pandas as pd

options = Options()
options.add_argument('--incognito')
chrome_path = r"C:/Users/prakh/Documents/PythonScripts/chromedriver.exe"
driver = webdriver.Chrome(chrome_path,options=options)

driver.implicitly_wait(3)
url = "https://reversewhois.domaintools.com/?refine#q=%5B%5B%5B%22whois%22%2C%222%22%2C%22VerifiedID%40SG-Mandatory%22%5D%5D%5D"
driver.get(url)
#html = driver.page_source
#soup = BeautifulSoup(html,'lxml')

#These line of codes is for selecting the desired search parameter from the combo box
input = driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[3]/input')
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[1]/div').click()
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[5]/div[1]/div/div[3]').click()
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[2]/div/div[1]').click()
driver.find_element_by_xpath('//*[@id="q0"]/div[2]/div/div[1]/div[6]/div[1]/div/div[1]').click
input.send_keys("VerifiedID@SG-Mandatory")
driver.find_element_by_xpath('//*[@id="search-button-container"]/button').click()

time.sleep(5)
html = driver.page_source
tables = pd.read_html(html)

df = tables[-1]
print(df)

edited Mar 28, 2020 at 10:28

answered Mar 28, 2020 at 10:14

Prakhar Jhudele

9651 gold badge7 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

draw134 Over a year ago

did you get some results for it sir?

Prakhar Jhudele Over a year ago

If you feel the issue is resolved, please accept the answer by clicking on the checkmark on the left handside in my answer.\

draw134 Over a year ago

can you please explain your answer to me in full details? Why do you add time.sleep(5) in getting the source?

Prakhar Jhudele Over a year ago

sure, after you have selected all the details options to fill and click on the search button it is kind of an implicit wait to make sure we get the full page source. Then we used the read_html from pandas which searches for any tables present in the html and returns a list of dataframe. we take the required df from there. Hope it helps.

draw134 Over a year ago

okay. please put it in the answer so that other users can see easily. Thanks a lot

QHarr · Accepted Answer · 2020-03-28 18:17:34Z

1

If you are open to other ways does the following give the expected results? It mimics the xhr the page does (though I have trimmed it down to essential elements only) to retrieve the lookup results. Faster than using a browser.

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get('https://reversewhois.domaintools.com/?ajax=mReverseWhois&call=ajaxUpdateRefinePreview&q=[[[%22whois%22,%222%22,%22VerifiedID@SG-Mandatory%22]]]&sf=true', headers=headers)
table = pd.read_html(r.json()['results'])
print(table)

edited Mar 28, 2020 at 18:17

answered Mar 28, 2020 at 18:12

QHarr

84.5k14 gold badges58 silver badges105 bronze badges

Collectives™ on Stack Overflow

How to get data from javascript rendered table using selenium in python

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related