0

I am using .select() with BeautifulSoup and I am not sure why only part of my expect results are being returned.

My HTML has a format of

<div class="a">
  <a class="class-type">
  <a class="class-type">
  <a class="class-type">
  <a class="class-type">
  .... {12 times}
</div>
<div class="a">
  <a class="class-type">
  <a class="class-type">
  <a class="class-type">
  <a class="class-type">
  .... {12 times}
</div>
<div class="a">
  <a class="class-type">
  <a class="class-type">
  <a class="class-type">
  <a class="class-type">
  .... {12 times}
</div>

Code:

soup = BeautifulSoup(html, 'lxml')
item_urls = soup.select(".css-ix8km1")

returns only 12 items when I am expecting 36 items returned

2
  • Can you post your link and code you are using. Or the original response text instead of <a class="class-type">. Commented Jan 13, 2019 at 22:35
  • @BittoBennichan HTML is too big but the URL: https://www.sephora.com/shop/face-makeup?pageSize=300 and the div with the attribute data-comp=ProductGrid. I am trying to grab all the hrefs within that tag Commented Jan 13, 2019 at 22:39

2 Answers 2

2

As already mentioned by cody, you will need to use some mechanism like selenium.I tried out the page down and was able to get the output with the following code. You need to close the popup ad by clicking on the the 'X' button before you apply page down.

import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import selenium
driver = webdriver.Chrome(executable_path='/home/bitto/chromedriver') #change this
driver.get("https://www.sephora.com/shop/face-makeup?pageSize=300")
#to close the popup ad
try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.XPATH, "//button[@class='css-1mfnet7 ']"))
    )
    element.click()
except selenium.common.exceptions.TimeoutException:
    print("Ad was not found")
time.sleep(1) #not preferred but will do for now
elem = driver.find_element_by_tag_name("body")
item_urls=[]
no_of_pagedowns = 3

while no_of_pagedowns:
    elem.send_keys(Keys.PAGE_DOWN)
    time.sleep(5) #not preferred but will do for now
    no_of_pagedowns-=1
post_elems =driver.find_elements_by_xpath("//a[@class='css-ix8km1']")
for elem in post_elems:
    item_urls.append(elem.get_attribute("href"))
print(item_urls)

Ouput

['https://www.sephora.com/product/pro-filtr-soft-matte-longwear-foundation-P87985432?icid2=products%20grid:p87985432:product', 'https://www.sephora.com/product/pro-filt-r-instant-retouch-concealer-P88779809?icid2=products%20grid:p88779809:product', 'https://www.sephora.com/product/radiant-creamy-concealer-P377873?icid2=products%20grid:p377873:product', 'https://www.sephora.com/product/translucent-loose-setting-powder-P109908?icid2=products%20grid:p109908:product', 'https://www.sephora.com/product/pro-filt-r-instant-retouch-setting-powder-P88779810?icid2=products%20grid:p88779810:product', 'https://www.sephora.com/product/diamond-bomb-all-over-diamond-veil-P85225585?icid2=products%20grid:p85225585:product', 'https://www.sephora.com/product/the-silk-canvas-P428661?icid2=products%20grid:p428661:product', 'https://www.sephora.com/product/pineapple-my-eye-collector-s-set-P435947?icid2=products%20grid:p435947:product', 'https://www.sephora.com/product/double-wear-stay-in-place-makeup-P378284?icid2=products%20grid:p378284:product', 'https://www.sephora.com/product/ultra-hd-invisible-cover-foundation-P398321?icid2=products%20grid:p398321:product', 'https://www.sephora.com/product/all-nighter-long-lasting-makeup-setting-spray-P263504?icid2=products%20grid:p263504:product', 'https://www.sephora.com/product/your-skin-but-better-cc-cream-spf-50-P411885?icid2=products%20grid:p411885:product', 'https://www.sephora.com/product/luminous-silk-foundation-P393401?icid2=products%20grid:p393401:product', 'https://www.sephora.com/product/born-this-way-P397517?icid2=products%20grid:p397517:product', 'https://www.sephora.com/product/born-this-way-super-coverage-multi-use-sculpting-concealer-P432298?icid2=products%20grid:p432298:product', 'https://www.sephora.com/product/lock-it-tattoo-foundation-P311138?icid2=products%20grid:p311138:product', 'https://www.sephora.com/product/fresh-face-kit-P440030?icid2=products%20grid:p440030:product', 'https://www.sephora.com/product/teint-idole-ultra-24h-long-wear-foundation-P308201?icid2=products%20grid:p308201:product', 'https://www.sephora.com/product/fauxfilter-foundation-P424302?icid2=products%20grid:p424302:product', 'https://www.sephora.com/product/creaseless-concealer-P433206?icid2=products%20grid:p433206:product', 'https://www.sephora.com/product/bareminerals-original-foundation-broad-spectrum-spf-15-P61003?icid2=products%20grid:p61003:product', 'https://www.sephora.com/product/shimmering-skin-perfector-pressed-P381176?icid2=products%20grid:p381176:product', 'https://www.sephora.com/product/tinted-moisturizer-broad-spectrum-P109936?icid2=products%20grid:p109936:product', 'https://www.sephora.com/product/veil-mineral-primer-P210575?icid2=products%20grid:p210575:product']
Sign up to request clarification or add additional context in comments.

Comments

1

The reason is only the first 12 items are rendered in the response, the rest are lazily loaded via the site's javascript code. This can be confirmed by requesting that url with curl and counting the number of instances of the class string:

$ curl -s 'https://www.sephora.com/shop/face-makeup?pageSize=300' | grep -o css-ix8km1 | wc -l
13

You may need to utilize a mechanism that will execute javascript, like Selenium WebDriver.

3 Comments

I am using Selenium WebDriver so Javascript is rendered but with .select() I am still only able to grab 12.
@Liondancer I see. Well, if you manually use the site, you will see that the mechanism that loads each additional batch is the act of scrolling down. So you would have to reproduce the scrolling with WebDriver. There are many SO answers on this topic.
ahhh that makes sense. Thank you! i will try that

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.