Disclaimer: respect the website, don't bombard the site with requests ;)
As the link provided by Swaroop Humane indicates, selenium is mainly useful for testing the mechanics of a website and not very effective for gathering data.
However, most of the time you don't have to run javascript, the engine of the websites does it itself very well. But you, as a user, have to make the right requests.
Without going into details, you must explore the data that passes between the client (you) and the server (F12 -> Network tab -> (Html, Xhr, etc ...)
So here is the code (commented) :
## import usefull libraries
import requests as rq
from bs4 import BeautifulSoup as bs
from urllib.parse import unquote
import json
## set up initial header and initial request
headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0"}
url_base = "https://www.urbanoutfitters.com/shop/converse-chuck-taylor-all-star-canvas-platform-high-top-sneaker?category=SEARCHRESULTS&color=015&searchparams=q%3Dsneaker&type=REGULAR&quantity=1&reviewPage=13"
s = rq.session()
q_base = s.get(url_base, headers=headers)
## get the cookies from last request ;
## since no request cookies was set, website does and return a bunch of interesting elements
d = q_base.cookies.get_dict()
d2 = d['urbn_auth_payload'] # this element (a stringed dictionnary) contains the element of interest => "authToken"
d3 = unquote(d2)
d4 = json.loads(d3)
# We rebuild here a second header, like the one observed in F12 Network tab
headers_2 = {
"Host": "www.urbanoutfitters.com",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0",
"Accept": "application/json, text/plain, */*",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"x-urbn-site-id": "uo-us",
"x-urbn-channel": "web",
"x-urbn-country": "FR",
"x-urbn-currency": "USD",
"x-urbn-language": "en-US",
"x-urbn-experience": "ss",
"x-urbn-primary-data-center-id": "US-PA",
"x-urbn-geo-region": "EU-LN",
"Connection": "keep-alive",
"authorization": "Bearer " + d4["authToken"],
}
## the "url_target" returns data you want (again finded in F12 -> Network) ONLY when we
## pass correct set of header arguments (=> "headers_2")
## Note that I set offset=3 & limit=100 and the end of url string. I a greater limit
## but the server return to me :
## #b'{"code": "ERROR_PARAM_INVALID_LIMIT", "message": "Invalid limit value: 300, limit cannot be greater than 100"}'
url_target = "https://www.urbanoutfitters.com/api/catalog/v0/uo-us/product/converse-chuck-taylor-all-star-canvas-platform-high-top-sneaker/reviews?projection-slug=reviews&offset=3&limit=100"
q_target = s.get(url_target, headers=headers_2) # return json data
data = q_target.json() # parse json
count_review = data["product"]["reviewStatistics"]["totalReviewCount"] # number of reviews
data_reviews = data["results"] # reviews list
def extract_data(data_reviews):
# choisir ici quels sont les éléments à extraire
res = []
for el in data_reviews:
res.append([el['submissionTime'], el['userNickname'], el['title'], el['reviewText'] ])
return res
resultat = extract_data(data_reviews) # data u want
"resultat" contains :
[['2021-04-11T14:13:18.000+00:00',
'Ainsleyb',
'Definitely recommend',
'I love these shoes so much, you should definitely order these in your normal size. They go with everything and they make you outfit look even better. I’m also short so they definitely make me look taller.'],
['2021-04-10T22:27:00.000+00:00',
'Marisol B',
'Love them',
'I love these shoes I just don’t know what to wear with them lol.'],
......
......
Note 1: I have not taken into account the case where there are more than 100 reviews on a product.
Note 2: the central part of the "url_target" must be adapted to extract reviews of other products