Scraping data from table using selenium

Question

I would like to scrape all company info under "Symbol", "Name", and "Earnings Call Time" from the following page: https://finance.yahoo.com/calendar/earnings

This is what I have so far for just company name, but I'm getting the error:

"NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id='cal-res-table']/div[1]/table/tbody/tr[1]/td[2]"} (Session info: chrome=86.0.4240.198)"

from selenium import webdriver
import datetime

tomorrow = (datetime.date.today() + datetime.timedelta(days=1)).isoformat() #get tomorrow in iso format as needed
url = "https://finance.yahoo.com/calendar/earnings?day="+tomorrow
print ("url: " + url)

driver = webdriver.Chrome("C:/Users/jrod94/Downloads/chromedriver_win32/chromedriver.exe")
driver.get(url)
element = driver.find_element_by_xpath("//*[@id='cal-res-table']")
Companies = [a.get_attribute("Company") for a in element]

driver.close()

Abhishek Rai · Accepted Answer · 2020-11-19 07:48:38Z

3

How about using pandas?

import datetime
import pandas as pd

pd.set_option('display.max_column',None)
tomorrow = (datetime.date.today() + datetime.timedelta(days=1)).isoformat() #get tomorrow in iso format as needed'''
url = pd.read_html("https://finance.yahoo.com/calendar/earnings?day="+tomorrow, header=0)
table = url[0]
print(table)

Ouput:-

  Symbol                         Company  Earnings Call Time EPS Estimate  \
0    WBAI                     500.Com Ltd  After Market Close            -   
1    BRBR             Bellring Brands Inc                 TAS         0.19   
2     BKE                      Buckle Inc  Before Market Open         0.54   
3     BNR        Burning Rock Biotech Ltd                 TAS        -0.12   
4     IEC            IEC Electronics Corp                 TAS            -   
5    GEOS      Geospace Technologies Corp                 TAS            -   
6    DREM  Dream Homes & Development Corp   Time Not Supplied            -   
7    DXLG        Destination XL Group Inc  Before Market Open            -   
8      FL                 Foot Locker Inc  Before Market Open         0.61   
9     HHR            HeadHunter Group PLC                 TAS         0.14   
10    HHR            HeadHunter Group PLC  Before Market Open         0.14   
11    RMR                   RMR Group Inc  Before Market Open         0.39   
12    GSX                 GSX Techedu Inc  Before Market Open        -0.31   
13    GSX                 GSX Techedu Inc                 TAS        -0.31   
14   HIBB              Hibbett Sports Inc  Before Market Open         0.45   
15   HAYN        Haynes International Inc                 TAS         -0.7   
16   IIIV                i3 Verticals Inc                 TAS         0.18   
17   AIHS          Senmiao Technology Ltd  Before Market Open

answered Nov 19, 2020 at 7:48

Abhishek Rai

2,2474 gold badges26 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

June Smith Over a year ago

This is exactly what I was looking for - thanks! I'd like to append info from 2 dates to the same dataframe. I've tried this code but can't get it to work. Can you please help?

June Smith Over a year ago

import datetime import pandas as pd date = (datetime.date.today() + datetime.timedelta(days=1)).isoformat() #get tomorrow in iso format as needed''' for i in range(2): try: date = (datetime.date.today() + datetime.timedelta(days = i )).isoformat() #get tomorrow in iso format as needed''' pd.set_option('display.max_column',None) url = pd.read_html("finance.yahoo.com/calendar/earnings?day="+date, header=0) table = url[0] table.append(table) print(table) except ValueError: continue

Abhishek Rai Over a year ago

@JuneSmith It could be easily done, However, I would suggest you ask a new question with a screenshot of the table. I don't think answering that in the comments would be appropiate. Thanks.!

June Smith Over a year ago

I've asked the question here: "Scraping and appending data while looping through html tables"

Abhishek Rai Over a year ago

@JuneSmith Check the URL again and make sure you use the tag pandas in the question.

Berdan Akyürek · Accepted Answer · 2020-11-19 07:55:44Z

Actually, your codes give an error but not in the same line with you, but later. Maybe the problem is page is not loaded when you try to reach the element. A little delay before the line that error occurs may solve the problem.

from selenium import webdriver
import datetime
import time

tomorrow = (datetime.date.today() + datetime.timedelta(days=1)).isoformat() #get tomorrow in iso format as needed
url = "https://finance.yahoo.com/calendar/earnings?day="+tomorrow
print ("url: " + url)

driver = webdriver.Chrome("C:/Users/jrod94/Downloads/chromedriver_win32/chromedriver.exe")
driver.get(url)
time.sleep(1) # you can increase 1 if it still does not work
element = driver.find_element_by_xpath("//*[@id='cal-res-table']")
Companies = [a.get_attribute("Company") for a in element]

driver.close()

αԋɱҽԃ αмєяιcαη · Accepted Answer · 2020-11-19 08:12:35Z

1

Since your question is regarding selenium:

You should take a look about Selenium-Waits

Where you are waiting for presents of all elements located within the HTML source code,the following code should describe it:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


def main(url):
    driver = webdriver.Firefox()
    driver.get(url)
    try:
        cnames = [x.text for x in WebDriverWait(driver, 10).until(
            EC.presence_of_all_elements_located(
                (By.CSS_SELECTOR, "td[aria-label='Company']"))
        )]
    finally:
        print(cnames)
        driver.quit()


main("https://finance.yahoo.com/calendar/earnings")

Output:

['111 Inc', '360 DigiTech Inc', 'American Software Inc', 'American Software Inc', 'Corporacion America Airports SA', 'Atkore International Group Inc', 'Atkore International Group Inc', 'Helmerich and Payne Inc', 'Amtech Systems Inc', 'Amtech Systems Inc', 'Delta Apparel Inc', 'Delta Apparel Inc', 'Bellring Brands Inc', 'Berry Global Group Inc', 'Beacon Roofing Supply Inc', 'Natural Grocers By Vitamin Cottage Inc', "BJ's Wholesale Club Holdings Inc", 'Entera Bio Ltd', 'SG Blocks Inc', 'SG Blocks Inc', 'BEST Inc', 'Brady Corp', 'BioHiTech Global Inc', 'BioHiTech Global Inc', 'Oaktree Strategic Income Corporation', 'Caleres Inc', 'Pennantpark Investment Corp', 'Geospace Technologies Corp', 'Canadian Solar Inc', 'Oaktree Specialty Lending Corp', 'Matthews International Corp', 'Clearsign Technologies Corp', "Children's Place Inc", 'Elys Game Technology Corp', 'Dada Nexus Ltd', 'ESCO Technologies Inc', 'Euroseas Ltd', 'Fangdd Network Group Ltd', 'Fangdd Network Group Ltd', 'Golden Ocean Group Ltd', 'Hoegh LNG Partners LP', 'Post Holdings Inc', 'Huize Holding Ltd', 'Haynes International Inc', "Macy's Inc", 'OneWater Marine Inc', 'OneWater Marine Inc', 'Woodward Inc', 'StealthGas Inc', 'Maximus Inc', 'Ross Stores Inc', 'Intuit Inc', 'Ooma Inc', 'Williams-Sonoma Inc', 'Precipio Inc', 'NetEase Inc', 'Workday Inc', 'i3 Verticals Inc', 'Knot Offshore Partners LP', 'Maxeon Solar Technologies Ltd', 'Opera Ltd', 'Puxin Ltd', 'Puxin Ltd']

Note: You don't need to use selenium as it's will slow down your task at all.

Also i see there's no reason to import a huge library such as pandas to read just an HTML table.

Simply you can pickup the target via the following code where you will get the exact call date:

import requests
import re
import json
import csv

keys = ['ticker', 'companyshortname', 'startdatetime']


def main(url):
    r = requests.get(url)
    goal = json.loads(re.search(r"App\.main.*?({.+})", r.text).group(1))
    target = [[item[k] for k in keys] for item in goal['context']
              ['dispatcher']['stores']['ScreenerResultsStore']['results']['rows']]
    with open("result.csv", 'w', newline="") as f:
        writer = csv.writer(f)
        writer.writerow(keys)
        writer.writerows(target)


main("https://finance.yahoo.com/calendar/earnings")

Output: view-online

edited Nov 19, 2020 at 8:12

answered Nov 19, 2020 at 8:01

αԋɱҽԃ αмєяιcαη

11.6k3 gold badges23 silver badges58 bronze badges

6 Comments

Abhishek Rai Over a year ago

In what way is pandas a "Huge" library? Isn't it specifically useful in reading tables from sites?

αԋɱҽԃ αмєяιcαη Over a year ago

@AbrarAhmed pandas include multiple lib within the background such as numpy. even pd.read_html is using requests lib in the background. so that's not logical to import pandas to use it for just read html. also i think the logical way to answer the question which being asked firstly then offer the other way for the OP

Abhishek Rai Over a year ago

Sorry, that makes no sense to me..but..good to know.Cheers.

Abhishek Rai Over a year ago

@ahmedamerican I also used to think the same. So, I asked on meta. There is no such rule here. meta.stackoverflow.com/questions/402902/…

αԋɱҽԃ αмєяιcαη Over a year ago

@AbrarAhmed i just noticed that your post was 2 days ago. btw the response which you received there is based on different example which you shared for the viewers. but let me confirm for you. If you asked me about an issue with x so i should return back to you with an answer regarding x before offering you y. in that case the OP and viewers will get the exact point.

|

Collectives™ on Stack Overflow

Scraping data from table using selenium

3 Answers 3

5 Comments

Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related