Generating a list of URLs with Selenium Python

Question

I'm trying to generate a list of URLs with Selenium. I would like the user to navigate through the instrumented browser and finally create a list of URL that he visited.

I found that the property "current_url" could help to do that but I didn't find a way to know that the user clicked on a link.

In [117]: from selenium import webdriver

In [118]: browser = webdriver.Chrome()

In [119]: browser.get("http://stackoverflow.com")

--> here, I click on the "Questions" link.

In [120]: browser.current_url

Out[120]: 'http://stackoverflow.com/questions'

--> here, I click on the "Jobs" link.

In [121]: browser.current_url

Out[121]: 'http://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab'

Any hint appreciated !

Thank you,

crookedleaf · Accepted Answer · 2017-03-07 19:49:33Z

There isn't really an official way to monitor what a user is doing in Selenium. The only thing you can really do is start the driver, then run a loop that is constantly checking the driver.current_url. However, I don't know what the best way to exit this loop is since i don't know what your usage is. Maybe try something like:

from selenium import webdriver


urls = []

driver = webdriver.Firefox()

current = 'http://www.google.com'
driver.get('http://www.google.com')
while True:
    if driver.current_url != current:
        current = driver.current_url

        # if you want to capture every URL, including duplicates:
        urls.append(current)

        # or if you only want to capture unique URLs:
        if current not in urls:
            urls.append(current)

If you don't have any idea on how to end this loop, i'd suggest either the user navigating to a url that will break the loop, such as http://www.endseleniumcheck.com and add it into the code as such:

from selenium import webdriver


urls = []

driver = webdriver.Firefox()

current = 'http://www.google.com'
driver.get('http://www.google.com')
while True:
    if driver.current_url == 'http://www.endseleniumcheck.com':
        break

    if driver.current_url != current:
        current = driver.current_url

        # if you want to capture every URL, including duplicates:
        urls.append(current)

        # or if you only want to capture unique URLs:
        if current not in urls:
            urls.append(current)

Or, if you want to get crafty, you can terminate the loop when the user exit's the browser. You can do this by monitoring the Process ID with the psutil library (pip install psutil):

from selenium import webdriver
import psutil


urls = []

driver = webdriver.Firefox()
pid = driver.binary.process.pid

current = 'http://www.google.com'
driver.get('http://www.google.com')
while True:
    if pid not in psutil.pids():
        break

    if driver.current_url != current:
        current = driver.current_url

        # if you want to capture every URL, including duplicates:
        urls.append(current)

        # or if you only want to capture unique URLs:
        if current not in urls:
            urls.append(current)

Thank you very much ! It will do. Personally, I finally used a try/catch structure in order to handle the browser exit (throwing an exception). It's not clean but good enough for what I'm trying to do.

Collectives™ on Stack Overflow

Generating a list of URLs with Selenium Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related