0

I am trying to get table data from below code but surprisingly the script shows a "none" output for table, though I could clearly see it in my HTML doc. Look forward for help..The below image shows the "inspect view of the document"]

from urllib2 import urlopen, Request
from bs4 import BeautifulSoup
site = 'http://www.altrankarlstad.com/wisp'
hdr = {'User-Agent': 'Chrome/78.0.3904.108'}
req = Request(site, headers=hdr)
res = urlopen(req)
rawpage = res.read()
page = rawpage.replace("<!-->", "")
soup = BeautifulSoup(page, "html.parser")
table = soup.find("table", {"class":"table workitems-table mt-2"})
print (table)

Also here comes the code with Selenium Script as suggested:

import time
from bs4 import BeautifulSoup
from selenium import webdriver

url = 'http://www.altrankarlstad.com/wisp'

driver = webdriver.Chrome('C:\\Users\\rugupta\\AppData\\Roaming\\Microsoft\\Windows\\Start Menu\\Programs\\Python 3.7\\chromedriver.exe') 

driver.get(url)
driver.find_element_by_id('root').click() #click on search button to fetch list of bus schedule

time.sleep(10) #depends on how long it will take to go to next page after button click

for i in range(1,50):
    url = "http://www.altrankarlstad.com/wisp".format(pagenum = i)

text_field = driver.find_elements_by_xpath("//*[@id="root"]/div/div/div/div[2]/table")
for h3Tag in text_field:
    print(h3Tag.text)

1 Answer 1

1

The page wasn't fully loaded when you use Request. you can debug by printing res. It seems the page is using javascript to load the table.

You should use selenium, load the page with driver (eg: chromedriver, Firefoxdriver). Sleep a while until the page is loaded (you define it, it take quite a bit to load fully). Then get the table using selenium

import time
from bs4 import BeautifulSoup
from selenium import webdriver

url = 'http://www.altrankarlstad.com/wisp'

driver = webdriver.Chrome('/path/to/chromedriver) 

driver.get(url)
# I dont understand what's the purpose when clicking that button
time.sleep(100) 

text_field = driver.find_elements_by_xpath('//*[@id="root"]/div/div/div/div[2]/table')
print (text_field[0].text)

You code worked fine with a bit of modifying, this will print all the text from the table. You should learn to debug and change it to get what you want.

This is my output running above scripts

This is my output running above scripts

Sign up to request clarification or add additional context in comments.

10 Comments

Ok, So here is my Selenium Script:
Hi Hung, As suggested by you, I have put it in the code window :) but not working, am I missing anything?
Hi Hung, thanks again for your reply though the script suggested by you only throws table headings..am also trying to debug it..attached an output image attached for your reference:
Please recheck if you run it properly, I inserted my output @Zygote
Hey Hung, thanks so much for that direction...yes am seeing the output now. I flagged your answer as useful. though I do not have enough repute to vote! (: Now am trying to save the data in data frame and export it in CSV!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.