5

Below I have setup a script which simply executes a search on a website. The goal is to capture JSON data utilizing Selenium from an event that is fired from an intermediate script, namely the POST request to "https://www.botoxcosmetic.com/sc/api/findclinic/FindSpecialists" as seen in the included image, but without directly sending a request to that URL using Selenium or the requests library. What is the best way to do this, preferably in Python but open to any language?

from selenium import webdriver
base_url = 'https://www.botoxcosmetic.com/women/find-a-botox-cosmetic-specialist'
driver = webdriver.Chrome()
driver.find_element_by_class_name('normalZip').send_keys('10022')
driver.find_element_by_class_name('normalSearch').click()

enter image description here

11
  • I think beautifulsoup would be more useful for this. Commented Apr 19, 2019 at 15:14
  • 4
    Why do you need to use Selenium? I feel like you are using the wrong tool for the job. Commented Apr 19, 2019 at 16:03
  • 1
    Don't know if this is a starting point? Commented Apr 19, 2019 at 19:56
  • 1
    Puppeteer makes it easy, but it's Javascript not Python. Commented Apr 20, 2019 at 23:32
  • 1
    You could inject javascript and override xhr methods see there Commented Apr 28, 2019 at 10:40

1 Answer 1

3
+50

You will need to use a proxy, my suggestion would be to use the BrowserMob Proxy.

First of all install the BrowserMob Proxy libraries:

pip install browsermob-proxy

You will then need to download the latest release (2.1.4 at the time of writing this), extract it and then place it in your project directory. This is going to be a location you need to pass in when setting up the BrowserMob Proxy server (See below where Server("browsermob-proxy-2.1.4/bin/browsermob-proxy") is defined)

I've then updated your script to the following:

import json

from browsermobproxy import Server
from haralyzer import HarParser
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

base_url = 'https://www.botoxcosmetic.com'
server = Server("browsermob-proxy-2.1.4/bin/browsermob-proxy")
server.start()
proxy = server.create_proxy()

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--proxy-server={0}".format(proxy.proxy))

driver = webdriver.Chrome(options=chrome_options)
driver.get("{0}/women/find-a-botox-cosmetic-specialist".format(base_url))

proxy.new_har(options={"captureContent": "true"})
driver.find_element_by_class_name('normalZip').send_keys('10022')
driver.find_element_by_class_name('normalSearch').click()

WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#specialist-results > div")))

har_parser = HarParser(proxy.har)
for entry in har_parser.har_data["entries"]:
    if entry["request"]["url"] == "{0}/sc/api/findclinic/FindSpecialists".format(base_url):
        result = json.loads(entry["response"]["content"]["text"])

driver.quit()
server.stop()

This will start up a BrowserMob Proxy instance and capture the response for the FindSpecialists network call and store it as JSON in the result variable.

You can then use that to do whatever you want to do with the response. Apologies if the code is not as clean as you would expect, I'm not a native Pythonista.

Useful references are:

Sign up to request clarification or add additional context in comments.

3 Comments

if you can please update your code to answer the specific example stated in the question, I can accept your answer
Updated with a detailed solution to your problem
Thanks. Excellent answer and a very good reference for anyone else attempting something similar

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.