Scrape table data from .jsp page using Selenium

Question

I am trying to scrape a table from .jsp page (details below). The table loads only after entering data (Train Number & Journey station)

For your trials, Train number can be 56913 & Journey station can be SBC (This will automatically change to 'KSR Bengaluru" after the data is entered.

With the script below, i am able to generate the table, however, i am unable to extract it (print results in an empty list). I need to get the full table. Can anyone help with letting be know how to extract the table?

I am very new to web-scraping. Hence, if have made some basic mistake, please nudge me gently in the right direction.

import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.firefox.options import Options
from selenium.webdriver import Firefox
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains

from bs4 import BeautifulSoup
import soupsieve as sv
import requests
# Activate the following line if you do not want to see the Firefox window.
# Better deactivate it for debugging.
# os.environ['MOZ_HEADLESS'] = '1'

url = 'https://enquiry.indianrail.gov.in/ntes/trainOnMapBh.jsp'

opts = Options()
driver = Firefox(firefox_binary=r"C:\Program Files (x86)\Mozilla Firefox\firefox.exe", options=opts)
driver.get(url)
WebDriverWait(driver, 20)

train_field = driver.find_element_by_id("trnSrchTxt")
train_field.send_keys("56913")
time.sleep(2)
actions = ActionChains(driver)
actions.send_keys('SBC',Keys.ENTER)
actions.perform()

WebDriverWait(driver, 1)
result_table = driver.find_elements_by_id("mapTrnSch")
print(result_table)

Update Apart from the answer from @MadRay, the following code gets the data as well (not sure how robust it is).

import os
import time
from bs4 import BeautifulSoup
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver import Firefox
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import re

os.environ['MOZ_HEADLESS'] = '1'
opts = Options()
driver = Firefox(firefox_binary=r"C:\Program Files (x86)\Mozilla Firefox\firefox.exe", options=opts)
driver.get('https://enquiry.indianrail.gov.in/ntes/trainOnMapBh.jsp')
WebDriverWait(driver, 20)

train_field = driver.find_element_by_id("trnSrchTxt")
train_field.send_keys("11302")
time.sleep(2)
actions = ActionChains(driver)
actions.send_keys('SBC',Keys.ENTER)
actions.perform()
time.sleep(2)
res = driver.execute_script("return document.documentElement.outerHTML")
driver.quit()

soup = BeautifulSoup(res, 'lxml')
table_rows =soup.find_all('table')[3].find_all('tr')
rows=[]
for tr in table_rows:
    td = tr.find_all('td')
    rows.append([i.text for i in td])
delaydata = rows[3:]
import pandas as pd
df = pd.DataFrame(delaydata, columns = ['StopNo','Station',1,'SchArr','SchDep','ETA_ATA','Arr_Delay','ETD_ATD','DepDelay','Distance','PF'])
df

MadRay · Accepted Answer · 2020-01-23 11:25:14Z

1

You have to search results by class_name, not an id:

results = driver.find_elements_by_class_name("mapTrnSch")

All other code is working well.

Important notice. You'll have two results. First is for table headers, second for table content.

Here's example I have written without WebDriverWait and ActionChains:

import time

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

url = 'https://enquiry.indianrail.gov.in/ntes/trainOnMapBh.jsp'

driver = Firefox(firefox_binary=r"C:\Program Files (x86)\Mozilla Firefox\firefox.exe", options=opts)
driver.get(url)
time.sleep(5)

# Send search data
driver.find_element_by_id("trnSrchTxt").send_keys("56913")  # Train
time.sleep(5)
driver.find_element_by_id("jrnyStn").send_keys('SBC')  # Journey
time.sleep(5)
driver.find_element_by_id("searchTrainInMapBtn").click()  # Submit button (seems like we do not need to click on it, but let's click for sure)
time.sleep(5)

# Gain results
results = driver.find_elements_by_class_name("mapTrnSch")
print(results[0].text)  # 1st result for table headers
print(results[1].text)  # 2st result for table content

answered Jan 23, 2020 at 11:25

MadRay

4415 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

moys Over a year ago

Thank you very much. I get the result i expected with your answer. I had already figured out an alternative way to get the data & turn it into a dataframe. If you have time, can you have a look at that code & let me know if there are any pitfalls with that (in case we change the train number of if i want get data for more trains in a loop or something like that). I have updated my code in the question.

Collectives™ on Stack Overflow

Scrape table data from .jsp page using Selenium

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related