0

I am scraping web pages using python-scrapy which works pretty well for static content. I am trying to scrape a url from this page but as it turns out, it is returned through a javascript call. For this I am using selenium but unable to figure out how to do it.

If you click on the "size chart" on the given link, you see a pop up opening mentioning the size guide. How can I get the url of this guide in my program?

I am also facing a similar problem on koovs as well getting the size guide. If anyone could guide on any of the links, I'd be really grateful.

1 Answer 1

1

Locate the "size chart" link by link text, click it and extract the data, example:

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get('http://www.jabong.com/athena-Red-Black-Top-476472.html?pos=3')

wait = WebDriverWait(driver, 10)
chart = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "size chart")))
chart.click()

for title in driver.find_elements_by_css_selector("div.size-chart-body div.size-chart table th"):
    print title.text

driver.close()

Prints (table header row, for the sake of an example):

Indian Size
Euro Size
Garment Bust (In.)
Garment Waist (in.)
Garment Hip (in.):

Note that you don't need selenium to get the size chart data, it is already inside the DOM, but invisible until you click "size chart". You can reach the same size chart table with Scrapy. Demo from the "Scrapy Shell":

$ scrapy shell http://www.jabong.com/athena-Red-Black-Top-476472.html?pos=3
In [1]: for title in response.css("div.size-chart-body div.size-chart table th")[1:]:
    print title.xpath("text()").extract()[0]
   ...:     
Indian Size
Euro Size
Garment Bust (In.)
Garment Waist (in.)
Garment Hip (in.)

In case of Koovs, you can still avoid using selenium and construct the size chart URL manually extracting the category and deal name, e.g.:

$ scrapy shell http://www.koovs.com/only-onlall-stripe-ls-shirt-59554.html?from=category-651
In [1]: category = response.xpath("//input[@id='master_category_name_id_ref']/@value").extract()[0]

In [2]: deal = response.xpath("//input[@id='deal_id']/@value").extract()[0]
In [3]: "http://www.koovs.com/koovs/sizechart/women/{category}/{deal}".format(category=category, deal=deal)
Out[3]: 'http://www.koovs.com/koovs/sizechart/women/Shirts--651--799--896/59554'

And, if you still want to go with selenium, here you are:

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get('http://www.koovs.com/only-onlall-stripe-ls-shirt-59554.html?from=category-651&skuid=236376')

wait = WebDriverWait(driver, 10)
chart = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[size_chart]")))
chart.click()

driver.switch_to.window(driver.window_handles[-1])

print driver.current_url

driver.close()

Prints:

http://www.koovs.com/koovs/sizechart/women/Shirts--651--799--896/59554
Sign up to request clarification or add additional context in comments.

5 Comments

Both ways work and thanks for pointing out the unnecessary use of selenium. I just want to get the URL of the size chart rather than the text present on it. Is there a way to do that?
@PraveshJain from what I see, it is just embedded into the page. There is no URL to the size chart.
Ok that seems true for the jabong link but for the koovs page there is a link to it. Any ideas on how to get it programmatically?
@PraveshJain please see the update, and let me know if you are still interested on how to solve it with selenium.
Thanks for the update. It works fine and leaves no need to use selenium. I am still curious as to what would we do if the url couldn't be formed using categories and deals. So if you have figured out a way to do it generally too, knowing it would be really helpful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.