Python, BS and Selenium

Question

I try to webscrape with javascript dynamic + bs + python and Ive read a lot of things to come up with this code where I try to scrape a price rendered with javascript on a famous website for example:

from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://www.nespresso.com/fr/fr/order/capsules/original/"

browser = webdriver.PhantomJS(executable_path = "C:/phantomjs-2.1.1-windows/bin/phantomjs.exe")
browser.get(url)
html = browser.page_source

soup = BeautifulSoup(html, 'lxml')

soup.find("span", {'class':'ProductListElement__price'}).text

But I only have as a result '\xa0' which is the source value, not the javascript value and I don't know really what I did wrong ...

Best regards

QHarr · Accepted Answer · 2019-12-18 02:26:19Z

1

You don't need the expense of a browser. The info is in a script tag so you can regex that out and handle with json library

import requests, re, json

r = requests.get('https://www.nespresso.com/fr/fr/order/capsules/original/')
p = re.compile(r'window\.ui\.push\((.*ProductList.*)\)')
data = json.loads(p.findall(r.text)[0])
products = {product['name']:product['price'] for product in data['configuration']['eCommerceData']['products']}
print(products)

Regex:

answered Dec 18, 2019 at 2:26

QHarr

84.5k14 gold badges58 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

josef_joestarr Over a year ago

Hello, thanks for this information, can you explain how did you find the script tag ? I suspected to be in it but couldn't find it by inspecting elements.

QHarr Over a year ago

I pulled back the response with jsoup which won't run javascript then searched the response for a product name /price off the actual webpage.

Marsilinou Zaky · Accepted Answer · 2019-12-18 00:04:00Z

0

Here are two ways to get the prices

from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://www.nespresso.com/fr/fr/order/capsules/original/"

browser = webdriver.Chrome()
browser.get(url)
html = browser.page_source

# Getting the prices using bs4
soup = BeautifulSoup(html, 'lxml')
prices = soup.select('.ProductListElement__price')
print([p.text for p in prices])

# Getting the prices using selenium 
prices =browser.find_elements_by_class_name("ProductListElement__price")
print([p.text for p in prices])

answered Dec 18, 2019 at 0:04

Marsilinou Zaky

1,0577 silver badges17 bronze badges

1 Comment

josef_joestarr Over a year ago

Oh thanks so the PhantomJS was the problem since the beginning ... shame on me. Big thanks !

Collectives™ on Stack Overflow

Python, BS and Selenium

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related