0

I am trying to webscrape the list of DAOs from masari.io but I am having trouble because I get the following errors:

DeprecationWarning: executable_path has been deprecated, please pass in a Service object


driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)

DevTools listening on ws://127.0.0.1:56691/devtools/browser/b4609671-5e6e-4d25-b09e-4116b3dde4bf
[0525/100030.252:INFO:CONSOLE(1)] "enabling sentry error tracker", source: https://messari.io/static/js/main.977a4794.chunk.js (1)
[0525/100030.951:INFO:CONSOLE(2)] "Unable to refresh token: Login required", source: https://messari.io/static/js/23.778d04d0.chunk.js (2)
[0525/100031.065:INFO:CONSOLE(2)] "


88b           d88                                                            88
888b         d888                                                            ""
88'8b       d8'88
88 '8b     d8' 88   ,adPPYba,  ,adPPYba,  ,adPPYba,  ,adPPYYba,  8b,dPPYba,  88
88  '8b   d8'  88  a8P_____88  I8[    ""  I8[    ""  ""     'Y8  88P'   "Y8  88
88   '8b d8'   88  8PP"""""""   '"Y8ba,    '"Y8ba,   ,adPPPPP88  88          88
88    '888'    88  "8b,   ,aa  aa    ]8I  aa    ]8I  88,    ,88  88          88
88     '8'     88   '"Ybbd8"'  '"YbbdP"'  '"YbbdP"'  '"8bbdP"Y8  88          88


", source: https://messari.io/static/js/23.778d04d0.chunk.js (2)
[0525/100031.069:INFO:CONSOLE(2)] "Interested in a CHALLENGE? Check out: https://messari.io/quiz", source: https://messari.io/static/js/23.778d04d0.chunk.js (2)
Traceback (most recent call last):
  File "c:/Users/Student/webScrape/scraper.py", line 21, in <module>
    matches = WebDriverWait(driver, 10).until(
  File "C:\Users\Student\AppData\Local\Programs\Python\Python38-32\lib\site-packages\selenium\webdriver\support\wait.py", line 89, in until
    raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
Backtrace:
        Ordinal0 [0x0096B8F3+2406643]
        Ordinal0 [0x008FAF31+1945393]
        Ordinal0 [0x007EC748+837448]
        Ordinal0 [0x008192E0+1020640]
        Ordinal0 [0x0081957B+1021307]
        Ordinal0 [0x00846372+1205106]
        Ordinal0 [0x008342C4+1131204]
        Ordinal0 [0x00844682+1197698]
        Ordinal0 [0x00834096+1130646]
        Ordinal0 [0x0080E636+976438]
        Ordinal0 [0x0080F546+980294]
        GetHandleVerifier [0x00BD9612+2498066]
        GetHandleVerifier [0x00BCC920+2445600]
        GetHandleVerifier [0x00A04F2A+579370]
        GetHandleVerifier [0x00A03D36+574774]
        Ordinal0 [0x00901C0B+1973259]
        Ordinal0 [0x00906688+1992328]
        Ordinal0 [0x00906775+1992565]
        Ordinal0 [0x0090F8D1+2029777]
        BaseThreadInitThunk [0x777BFA29+25]
        RtlGetAppContainerNamedObjectPath [0x77B77A7E+286]
        RtlGetAppContainerNamedObjectPath [0x77B77A4E+238]

I know there is an API for messari.io, but I am almost certain it is only for their assets and not their list of DAOs. I tried using Selenium since it is a dynamic page but I am still having trouble. Here is my code:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import requests

url = 'https://messari.io/governor/daos'

DRIVER_PATH = 'PATH_TO_DRIVER_ON_MY_PC'
options = Options()
options.headless = True
options.add_argument("--window-size=1920, 1200")

# s = Service('PATH_TO_DRIVER_ON_MY_PC')
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
driver.get('https://messari.io/governor/daos')

try:
    matches = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.TAG_NAME, "td")))
    # for match in matches:
    #     print(match.text)

finally:
    driver.quit()

Update I fixed the executable_path warning, but I am still getting the same TimeoutException error. And when I run it without headless I also get the following message:

DevTools listening on ws://127.0.0.1:57773/devtools/browser/4450b78d-3a9f-401a-b39c-2c716ecad924
[9628:20616:0525/102300.840:ERROR:device_event_log_impl.cc(214)] [10:23:00.840] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[9628:20616:0525/102300.841:ERROR:device_event_log_impl.cc(214)] [10:23:00.841] USB: usb_device_handle_win.cc:1049 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)

I assume this part is more of a hardware message that I shouldn't worry about based on similar questions bc when I unplugged my mouse it removed one of them.

5
  • first line is NOT error but only warning. Commented May 25, 2022 at 14:30
  • it seems it can't find this element. First you could display HTML (driver.page_source) to manually check if there is this element. And if this element is inside <frame> then you have to use driver.switch_to before you try to search it. Commented May 25, 2022 at 14:31
  • I check source code in DevTools and I don't see any <td> in code. It uses only <div> to create something like table. What do you really want to get? Commented May 25, 2022 at 14:34
  • I want to be able to get each element in the table, except the last column. For example in the first row: the name: Fei, the type: protocol, the tags: defi Commented May 25, 2022 at 14:36
  • 1
    as I said it DOESN'T use <td> to display it but <div> and it keeps Fei in <h4> - at least in my Firefox on desktop system. Commented May 25, 2022 at 14:37

2 Answers 2

2

This page doesn't use <td> to display list of DAOs.
It uses <div> (with CSS) to display it similar to table.

And it keeps name of DAO in <h4>

At least it uses and in my Firefox on laptop with Linux.


Full working code (tested on Linux Mint, Python 3.8, Selenium 4.x, Chrome 101.x)

I used module webdriver_manager so it automatically downloads fresh driver when Linux installs newer version of Chrome

I have to use find_elements() (with s in word elements) or presence_of_all_elements_located() to get all <h4>.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

from webdriver_manager.chrome import ChromeDriverManager

url = 'https://messari.io/governor/daos'

options = Options()
options.headless = True
options.add_argument("--window-size=1920, 1200")

driver = webdriver.Chrome(options=options, service=Service(ChromeDriverManager().install()))

driver.get('https://messari.io/governor/daos')

try:
    matches = WebDriverWait(driver, 10).until(
        EC.presence_of_all_elements_located((By.TAG_NAME, "h4")))
    
    #matches = driver.find_elements(By.TAG_NAME, "h4")
    
    for match in matches:
        if match.text:
            print(match.text)
finally:
    driver.quit()

Result:

Fei
Rook
Cosmos
Stargate Finance
Aave
Treasure DAO
DODO
Radicle
Goldfinch
Merit Circle
EPNS
Perpetual Protocol
Gitcoin
SuperRare
Indexed
Doodles
Rome DAO
Badger
Paraswap
Unlock
Terra
Shapeshift
Lobis
Pool Together
The Graph
Yearn Finance
Ampleforth
Alpaca Finance
Balancer
Gro Protocol
Sismo DAO
BeethovenX
ENS
Lido
Alchemist

EDIT:

TO get all values you may have to scroll page - and JavaScript will add new items.

There are answers which use while-loop with execute_script() which use JavaScript code to scroll to the bottom and get current height. If height is different than before scroll then you have to scroll again, but if height is the same then you have end of page and now you can get all items.

Sign up to request clarification or add additional context in comments.

9 Comments

Okay, so based on that, how would I change up my code to operate as I want it to. I tried to just change tag to "h4" from "td" and print matches.text but it only returned "Govenor" and when I tried to change presence_of_element_located to presence_of_all_elements_located it only returned "Governor" as wel
I added full working code.
That works, thank you so much! What would I need to do differently to get all 855 DAOs versus just the 70 DAOs that this returns?
code would have to behave like real human - you would have to scroll to the bottom of page, wait and check if page is not bigger then in previous scroll - if it bigger then scroll again, if not bigger then it is end of page. There are questions which shows how to use while-loop with JavaScript (execute_script()) for this
How did you figure out that it keeps the names of DAOs in h4? When I do inspect, it says the names are a <span>. I am trying to figure out how to get the Type and Tags of each DAO as well.
|
-1

With selenium4 as the key executable_path is deprecated you have to use an instance of the Service() class along with ChromeDriverManager().install() command as discussed below

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

options = Options()
options.add_argument("start-maximized")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
driver.get("https://www.google.com")

4 Comments

that fixed my first error message, but I am still getting a TimeoutException message similar to the one in my original question.
this is NOT error but only warning and OP can still use old method - and new method doesn't resolve main problem.
I can see you have updated the question and i posted the answer before. So if this has fixed your actual issue, so could you please accept the answer and create the separate question for other issue
This just fixed the warning I was getting, but has not fixed the actual issue I am running into

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.