Python - Selenium - Scraping through multiple websites

Question

I am trying to build a webscraper with python / selenium that scrapes data from multiple websites and stores the data in an Excel sheet.

The sites I want to scrape are the following:

https://www.ngm.se/marknaden/vardepapper?symbol=ETH%20ZERO%20SEK
https://www.ngm.se/marknaden/vardepapper?symbol=BTC%20ZERO%20SEK
https://www.ngm.se/marknaden/vardepapper?symbol=VALOUR%20CARDANO%20SEK
https://www.ngm.se/marknaden/vardepapper?symbol=VALOUR%20POLKADOT%20SEK
https://www.ngm.se/marknaden/vardepapper?symbol=VALOUR%20SOLANA%20SEK
https://www.ngm.se/marknaden/vardepapper?symbol=VALOUR%20UNISWAP%20SEK

From all sites I want to scrape the "Omsättning", "Volym" and "VWAP" values and store them in an excel sheet.

This is what I got so far:


    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import ElementNotVisibleException
    
    url = ["https://www.ngm.se/marknaden/vardepapper?symbol=ETH%20ZERO%20SEK"]
    
    driver = webdriver.Chrome()
    
    driver.get('https://www.ngm.se/marknaden/vardepapper?symbol=ETH%20ZERO%20SEK')
    
    iframe = driver.find_element(By.XPATH, '//iframe').get_attribute("src")
    driver.get(iframe)
    
    element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//div[@id="detailviewDiv"]//thead[.//span[contains(text(),"Volym")]]/following-sibling::tbody')))
    
    volym = element.text.split('\n')[-3]
    vwap = element.text.split('\n')[-2]
    Omsaettning = element.text.split('\n')[-4]
    
    print(volym, vwap, Omsaettning)

With that I am able to print the values from the ETH ZERO SEK website, however how can I simultaneously also scrape the data from the other wesbites and then store it into an excel? Also is possible to program it so that selenium does not need to open the browser to save on computer ressources?

Thanks a lot for any help in advance!

Anand Gautam · Accepted Answer · 2022-02-14 12:11:39Z

1

If you want to run them one after another in a loop, then you may have to use something like this:

urlist = ['https://www.ngm.se/marknaden/vardepapper?symbol=ETH%20ZERO%20SEK',
          'https://www.ngm.se/marknaden/vardepapper?symbol=BTC%20ZERO%20SEK',
          'https://www.ngm.se/marknaden/vardepapper?symbol=VALOUR%20CARDANO%20SEK',
          'https://www.ngm.se/marknaden/vardepapper?symbol=VALOUR%20POLKADOT%20SEK',
          'https://www.ngm.se/marknaden/vardepapper?symbol=VALOUR%20SOLANA%20SEK',
          'https://www.ngm.se/marknaden/vardepapper?symbol=VALOUR%20UNISWAP%20SEK']

for i in urlist:
    driver.get(i)
    print(i)
    time.sleep(5)
    iframe = driver.find_element(By.XPATH, '//iframe').get_attribute("src")
    driver.get(iframe)

    element = WebDriverWait(driver, 10).until(EC.presence_of_element_located(
        (By.XPATH, '//div[@id="detailviewDiv"]//thead[.//span[contains(text(),"Volym")]]/following-sibling::tbody')))

    volym = element.text.split('\n')[-3]
    vwap = element.text.split('\n')[-2]
    Omsaettning = element.text.split('\n')[-4]

    print(volym, vwap, Omsaettning)
driver.quit()

In this above option, you have to take care of the list indices as they may not stay the same for all the urls.

Contrarily, if you want all of them separately but simultaneously, then you may have to use the xdist library (which you have to install btw). But note that the more number of workers you require, the more resources the system will take.

If you want the browser not to be displayed, then you may use chromeoption --headless

from selenium.webdriver.chrome.options import Options
opt.add_argument('--headless')
driver = webdriver.Chrome(your driver path, options=opt)

The above options would not open the browser to visibility; however, I have seen that with the headless mode, your code is failing to find this element (which btw works with the head mode) element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//div[@id="detailviewDiv"]//thead[.//span[contains(text(),"Volym")]]/following-sibling::tbody')))

answered Feb 14, 2022 at 12:11

Anand Gautam

2,1011 gold badge6 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Lighto Over a year ago

thanks a lot for your input this already very much helped me getting towards a solution for my problem. Do you by any chance know how I can write the scraped data into a csv?

Anand Gautam Over a year ago

If you are using pandas in your code, then may send the dataframe directly to excel: df.to_excel('filename.csv'). If not, then one option I know is to use openpyxl which is also a library that must be installed (pip install) to use it.

Collectives™ on Stack Overflow

Python - Selenium - Scraping through multiple websites

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related