0

I am trying to scrape a table from .jsp page (details below). The table loads only after entering data (Train Number & Journey station)

For your trials, Train number can be 56913 & Journey station can be SBC (This will automatically change to 'KSR Bengaluru" after the data is entered.

With the script below, i am able to generate the table, however, i am unable to extract it (print results in an empty list). I need to get the full table. Can anyone help with letting be know how to extract the table?

I am very new to web-scraping. Hence, if have made some basic mistake, please nudge me gently in the right direction.

import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.firefox.options import Options
from selenium.webdriver import Firefox
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains

from bs4 import BeautifulSoup
import soupsieve as sv
import requests
# Activate the following line if you do not want to see the Firefox window.
# Better deactivate it for debugging.
# os.environ['MOZ_HEADLESS'] = '1'

url = 'https://enquiry.indianrail.gov.in/ntes/trainOnMapBh.jsp'

opts = Options()
driver = Firefox(firefox_binary=r"C:\Program Files (x86)\Mozilla Firefox\firefox.exe", options=opts)
driver.get(url)
WebDriverWait(driver, 20)

train_field = driver.find_element_by_id("trnSrchTxt")
train_field.send_keys("56913")
time.sleep(2)
actions = ActionChains(driver)
actions.send_keys('SBC',Keys.ENTER)
actions.perform()

WebDriverWait(driver, 1)
result_table = driver.find_elements_by_id("mapTrnSch")
print(result_table)

Update Apart from the answer from @MadRay, the following code gets the data as well (not sure how robust it is).

import os
import time
from bs4 import BeautifulSoup
from selenium.webdriver.support.ui import WebDriverWait
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver import Firefox
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
import re

os.environ['MOZ_HEADLESS'] = '1'
opts = Options()
driver = Firefox(firefox_binary=r"C:\Program Files (x86)\Mozilla Firefox\firefox.exe", options=opts)
driver.get('https://enquiry.indianrail.gov.in/ntes/trainOnMapBh.jsp')
WebDriverWait(driver, 20)

train_field = driver.find_element_by_id("trnSrchTxt")
train_field.send_keys("11302")
time.sleep(2)
actions = ActionChains(driver)
actions.send_keys('SBC',Keys.ENTER)
actions.perform()
time.sleep(2)
res = driver.execute_script("return document.documentElement.outerHTML")
driver.quit()

soup = BeautifulSoup(res, 'lxml')
table_rows =soup.find_all('table')[3].find_all('tr')
rows=[]
for tr in table_rows:
    td = tr.find_all('td')
    rows.append([i.text for i in td])
delaydata = rows[3:]
import pandas as pd
df = pd.DataFrame(delaydata, columns = ['StopNo','Station',1,'SchArr','SchDep','ETA_ATA','Arr_Delay','ETD_ATD','DepDelay','Distance','PF'])
df

1 Answer 1

1

You have to search results by class_name, not an id:

results = driver.find_elements_by_class_name("mapTrnSch")

All other code is working well.

Important notice. You'll have two results. First is for table headers, second for table content.

Here's example I have written without WebDriverWait and ActionChains:

import time

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

url = 'https://enquiry.indianrail.gov.in/ntes/trainOnMapBh.jsp'

driver = Firefox(firefox_binary=r"C:\Program Files (x86)\Mozilla Firefox\firefox.exe", options=opts)
driver.get(url)
time.sleep(5)

# Send search data
driver.find_element_by_id("trnSrchTxt").send_keys("56913")  # Train
time.sleep(5)
driver.find_element_by_id("jrnyStn").send_keys('SBC')  # Journey
time.sleep(5)
driver.find_element_by_id("searchTrainInMapBtn").click()  # Submit button (seems like we do not need to click on it, but let's click for sure)
time.sleep(5)

# Gain results
results = driver.find_elements_by_class_name("mapTrnSch")
print(results[0].text)  # 1st result for table headers
print(results[1].text)  # 2st result for table content

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much. I get the result i expected with your answer. I had already figured out an alternative way to get the data & turn it into a dataframe. If you have time, can you have a look at that code & let me know if there are any pitfalls with that (in case we change the train number of if i want get data for more trains in a loop or something like that). I have updated my code in the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.