I have this assignment of extracting some items from each row of a table in HTML. I have figured out how to grab the whole table from the web using Selenium with Python. Following is the code for that:
from selenium import webdriver
import time
import pandas as pd
mydriver = webdriver.Chrome('C:/Program Files/chromedriver.exe')
mydriver.get("https://www.bseindia.com/corporates/ann.aspx?expandable=0")
time.sleep(5) # wait 5 seconds until DOM will load completly
table = mydriver.find_element_by_xpath('//*[@id="ctl00_ContentPlaceHolder1_lblann"]/table/tbody')
for row in table.find_elements_by_xpath('./tr'):
print(row.text)
I am unable to understand the way I can grab specific items from the table itself. Following are the items that I require:
Company Name
PDF Link(if it does not exist, write "No PDF Link")
Received Time
Dessiminated Time
Time Taken
Description
Any help in logic would be helpful. Thanks in Advance.
table = soup.find('table', attrs={'cellpadding':"4", 'cellspacing':"1", 'width':"100%", 'border':"0"})will get the entire table, then get each row in the table withtable.find_all('tr'). For examplerow.find('td', attrs={'class': "TTHeadergrey"}will get items 1, 2-6. androw.find('a', attrs={'class':"tablebluelink"})['href']will get the PDF Link.