16

we are using selenium on python and as a part of our automation, we need to capture the messages that a sample website sends and receives after the web page loaded completely.

I have check here and it is stated that what we want to do is achievable using BrowserMobProxy but after testing that, the websocket connection did not work on website and certificate errors were also cumbersome.

In another post here it is stated that, this can be done using loggingPrefs of Chrome but it seemed that we only get the logs up to the time when website loads and not the data after that.

Is it possible to capture websocket traffic using only selenium?

1
  • I think this article may just be what you're looking for. Hope it helps. Commented Dec 9, 2019 at 11:37

3 Answers 3

17

Turned out that it can be done using pyppeteer; In the following code, all the live websocket traffic of a sample website is being captured:

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch(
        headless=True,
        args=['--no-sandbox'],
        autoClose=False
    )
    page = await browser.newPage()
    await page.goto('https://www.tradingview.com/symbols/BTCUSD/')
    cdp = await page.target.createCDPSession()
    await cdp.send('Network.enable')
    await cdp.send('Page.enable')

    def printResponse(response):
        print(response)

    cdp.on('Network.webSocketFrameReceived', printResponse)  # Calls printResponse when a websocket is received
    cdp.on('Network.webSocketFrameSent', printResponse)  # Calls printResponse when a websocket is sent
    await asyncio.sleep(100)


asyncio.get_event_loop().run_until_complete(main())
Sign up to request clarification or add additional context in comments.

4 Comments

Logged in to thank you for this, didn't know this existed You are amazing
This is nice that you can capture traffic from websocket. But how can you make a click from callback function printResponse, as it is synchronous? Basically how you can evaluate something like await page.click('a') inside the sync function? For example, I would like to click or not based on info taken from websocket. How to do that? Will appreciate for answer.
Is there a way to login in my account and then access the webscoket traffic ????
1

You can use Chrome’s performance logging to capture WebSocket frames with Selenium.

chrome_options = Options()
chrome_options.set_capability("goog:loggingPrefs", {"performance": "ALL"})

driver = webdriver.Chrome(options=chrome_options)

# trigger the web sockets

for entry in driver.get_log('performance'):
    try:
        message = json.loads(entry['message'])['message']
        if message['method'] in ('Network.webSocketFrameReceived', 'Network.webSocketFrameSent'):
            payload = message['params']['response']['payloadData']
            print(payload)
    except:
        pass

Comments

0

Here is a sample script of how to read the websocket traffic with Selenium and python, using chrome and the logging settings (see this post and this one )

Be aware that the chrome log gets emptied every time you read the logs.

Also: you need to define the helping function log_entries_to_dict() before you use it
[I simply posted the definition of the function below the example code to make the example better readable]

# list to save the log entries
log_entries = list()

# Initialize Chrome WebDriver with performance logging enabled
chrome_options = webdriver.ChromeOptions()

# enable logging for websocket capture (in performance)
chrome_options.add_argument('--enable-logging')
chrome_options.add_argument('--log-level=0')
chrome_options.set_capability('goog:loggingPrefs',{'performance': 'ALL'})
driver = webdriver.Chrome(options=chrome_options)

# Navigate to the target website
driver.get("'https://dashboard.mywebsite.com/login'")

# do something with the website like fill input fields and push buttons

# Read the captured network log entries (they are captured in the background)
# the driver-log is emptied on each get_log method !
log_entries.extend(driver.get_log("performance"))

# simplify the log entries
log_entries_deserialized = log_entries_to_dict(log_entries)

# limit the log entries to traffic connected to websockets
network_str = "network.websocket"  # low-key string
websocket_traffic = [_ for _ in log_entries_deserialized if _['method'].lower().startswith(network_str)]

# analyse the websocket_traffic

# then again: do something with the website

# Now read the newly captured network log entries and extend the list containing the old log entries
log_entries.extend(driver.get_log("performance"))
log_entries_deserialized = log_entries_to_dict(log_entries)
websocket_traffic = [_ for _ in log_entries_deserialized if _['method'].lower().startswith(network_str)]

# again: analyse the websocket traffic
# repeat as often as needed 


def log_entries_to_dict(inp_list : list, optimize_urls = True) -> list:
"""converts a list of json-log entries into a list of python dicts"""
list_out = list()
dicts_request_id_and_url = dict()

for list_entry in inp_list:
    try:
        obj_serialized = list_entry.get("message")
        obj = json.loads(obj_serialized)
        message = obj.get("message")
        method = message.get("method")
        url = message.get("params", {}).get("documentURL")  # reverts to None if the key 'url' was not found

        tmp_dict = dict()
        tmp_dict['method'] = method
        tmp_dict['level'] = list_entry['level']
        tmp_dict['timestamp'] = list_entry['timestamp']
        tmp_dict['webview'] = obj['webview']
        tmp_dict['requestId'] = None # will be overwritten if there is an entry in message['params']
        tmp_dict['url'] = url

        for key in message['params']:
            tmp_dict[key] = message['params'][key]

        if optimize_urls:
            # as the URL is not always transferred (actually only on the start of a connection),
            # we assign the URLs to each package, based on the requestID
            # Purpose: to make it easier for a human to follow a datastream
            #
            # this assignment can be skipped for performance reasons (see the keyword in the function parameters)

            # first we need to see if there is a request IDs:
            if tmp_dict['requestId'] is None:
                continue

            # check if the request IDs documentURL is already present in the dict
            if url is not None:
                if tmp_dict['requestId'] not in dicts_request_id_and_url:
                    dicts_request_id_and_url[tmp_dict['requestId']] = url
                    continue

            else:
                if tmp_dict['requestId'] in dicts_request_id_and_url:
                    tmp_dict['url'] = dicts_request_id_and_url[tmp_dict['requestId']]

        list_out.append(copy.copy(tmp_dict))

    except Exception as e:
        raise e from None

return list_out

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.