2

I'm trying to generate a list of URLs with Selenium. I would like the user to navigate through the instrumented browser and finally create a list of URL that he visited.

I found that the property "current_url" could help to do that but I didn't find a way to know that the user clicked on a link.

In [117]: from selenium import webdriver

In [118]: browser = webdriver.Chrome()

In [119]: browser.get("http://stackoverflow.com")

--> here, I click on the "Questions" link.

In [120]: browser.current_url

Out[120]: 'http://stackoverflow.com/questions'

--> here, I click on the "Jobs" link.

In [121]: browser.current_url

Out[121]: 'http://stackoverflow.com/jobs?med=site-ui&ref=jobs-tab'

Any hint appreciated !

Thank you,

1 Answer 1

2

There isn't really an official way to monitor what a user is doing in Selenium. The only thing you can really do is start the driver, then run a loop that is constantly checking the driver.current_url. However, I don't know what the best way to exit this loop is since i don't know what your usage is. Maybe try something like:

from selenium import webdriver


urls = []

driver = webdriver.Firefox()

current = 'http://www.google.com'
driver.get('http://www.google.com')
while True:
    if driver.current_url != current:
        current = driver.current_url

        # if you want to capture every URL, including duplicates:
        urls.append(current)

        # or if you only want to capture unique URLs:
        if current not in urls:
            urls.append(current)

If you don't have any idea on how to end this loop, i'd suggest either the user navigating to a url that will break the loop, such as http://www.endseleniumcheck.com and add it into the code as such:

from selenium import webdriver


urls = []

driver = webdriver.Firefox()

current = 'http://www.google.com'
driver.get('http://www.google.com')
while True:
    if driver.current_url == 'http://www.endseleniumcheck.com':
        break

    if driver.current_url != current:
        current = driver.current_url

        # if you want to capture every URL, including duplicates:
        urls.append(current)

        # or if you only want to capture unique URLs:
        if current not in urls:
            urls.append(current)

Or, if you want to get crafty, you can terminate the loop when the user exit's the browser. You can do this by monitoring the Process ID with the psutil library (pip install psutil):

from selenium import webdriver
import psutil


urls = []

driver = webdriver.Firefox()
pid = driver.binary.process.pid

current = 'http://www.google.com'
driver.get('http://www.google.com')
while True:
    if pid not in psutil.pids():
        break

    if driver.current_url != current:
        current = driver.current_url

        # if you want to capture every URL, including duplicates:
        urls.append(current)

        # or if you only want to capture unique URLs:
        if current not in urls:
            urls.append(current)
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much ! It will do. Personally, I finally used a try/catch structure in order to handle the browser exit (throwing an exception). It's not clean but good enough for what I'm trying to do.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.