How to scrape content from a dynamic table using python?

Question

I'm trying to extract RSI indicator present on this page under the 'Oscillators' tab.

URL : https://in.tradingview.com/markets/stocks-india/market-movers-active/

I know that I'll have to use something like Selenium to access the tab first, but how do I access the 'oscilators' div.

I'll need to use selenium, and then I could use beautiful-soup to find the right tags and data, right?

Edit -

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from time import sleep
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import pandas as pd

# create object for chrome options
chrome_options = Options()
base_url = 'https://in.tradingview.com/markets/stocks-india/market-movers-active/'


# To disable the message, "Chrome is being controlled by automated test software"
chrome_options.add_argument("disable-infobars")
# Pass the argument 1 to allow and 2 to block
chrome_options.add_experimental_option("prefs", { 
    "profile.default_content_setting_values.notifications": 2
    })
# invoke the webdriver
browser = webdriver.Chrome(executable_path = r'/Users/judhjitganguli/Downloads/chromedriver',
                          options = chrome_options)

browser.get('chrome://settings/')
browser.execute_script('chrome.settingsPrivate.setDefaultZoom(0.5);')
browser.get(base_url)

delay = 5 #seconds

while True:
    try:
  # find tab/button
        osiButton = browser.find_element_by_css_selector('.tv-screener-toolbar__favorites div div div:nth-child(8)')
        print('button text: ' + osiButton.text)
        osiButton.click()
        WebDriverWait(browser, 9).until(EC.text_to_be_present_in_element((By.CSS_SELECTOR, 'th:nth-child(2) .js-head-title'), "OSCILLATORS RATING"))
  
  # table updated, get the data
        for row in browser.find_elements_by_css_selector(".tv-data-table__tbody tr"):
            print(row.text)
           
        #for cell in browser.find_elements_by_css_selector('td'):
         #   print(cell.text)

        
        
    except Exception as ex:
        print(ex)
    

# close the automated browser
browser.close()

In the output, I get the required data but it is an infinite loop. How do I get it into a pandas df?

So if you inspect the element, it has a div id.

Judhjit Ganguli
– Judhjit Ganguli

2021-03-02 10:24:15 +00:00
Commented Mar 2, 2021 at 10:24 — Judhjit Ganguli
– Judhjit Ganguli, Commented Mar 2, 2021 at 10:24
My bad @uingtea/ any idea how to proceed?

Judhjit Ganguli
– Judhjit Ganguli

2021-03-02 10:38:47 +00:00
Commented Mar 2, 2021 at 10:38 — Judhjit Ganguli
– Judhjit Ganguli, Commented Mar 2, 2021 at 10:38

uingtea · Accepted Answer · 2021-03-02 17:14:02Z

1

after Oscillators clicked, wait and monitor element th:nth-child(2) .js-head-title for change, from Last to Oscillators Rating using WebDriverWait

# if running headless make sure to add this argument
# or the oscillators tab will not visible or can't be clicked
#chrome_options.add_argument("window-size=1980,960");

try:
  # find tab/button
  osiButton = driver.find_element_by_css_selector('.tv-screener-toolbar__favorites div div div:nth-child(8)')
  print('button text: ' + osiButton.text)
  osiButton.click()
  WebDriverWait(driver, 9).until(
      EC.text_to_be_present_in_element((By.CSS_SELECTOR, 'th:nth-child(2) .js-head-title'), "OSCILLATORS RATING"))
  
  # table updated, get the data
  for row in driver.find_elements_by_css_selector('.tv-data-table__tbody tr'):
      print(row.text)
      #for cell in driver.find_elements_by_css_selector('td'):
         #print(cell.text)

except Exception as ex:
  print(ex)

edited Mar 2, 2021 at 17:14

answered Mar 2, 2021 at 10:57

uingtea

6,6342 gold badges32 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Judhjit Ganguli Over a year ago

Tried this, but unsuccessful.

Judhjit Ganguli Over a year ago

No error message, but it just throws the timeout exception. I click on Oscillator, but nothing happens. Updated my question with the code I'm using.

uingtea Over a year ago

answer updated, it seem you didn't click the oscillators tab

Judhjit Ganguli Over a year ago

Hey, thanks SO much. I was able to get the required data after a couple of tweaks. (added code in the answer) But could you help me out with two more things. 1. Why is it an infinite loop? It should stop after parsing the entire page, right? 2. How can I convert this into a pandas df?

uingtea Over a year ago

move your code out from while True:, to convert try pd.read_html(driver.page_source) or select the table and pass to read_html()

Collectives™ on Stack Overflow

How to scrape content from a dynamic table using python?

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related