2

I'm a beginner with python, I'm trying build a python program that will scrape product descriptions from http://turnpikeshoes.com/shop/TCF00003. Python has many libraries and I'm sure many approaches to achieving my goal. I've done a few successful scrapes using requests however the fields I was looking for are not showing up, Using chromes inspector I found an Ajax POST request.

Here is my code

from lxml import html
import requests

url = 'http://turnpikeshoes.com/shop/TCF00003'
#URL
headers = {'user-agent': 'my-app/0.0.1'}
#Header info sent to server
page = requests.get(url, headers=headers)
#Get response
tree = html.fromstring(page.content)
#Page Content


ShortDsc = tree.xpath('//span[@itemprop="reviewBody"]/text()')

LongDsc = tree.xpath('//li[@class="productLongDescription"]/text()')

print 'ShortDsc:', ShortDsc
print 'LongDsc:', LongDsc

I think I need to send a request directly to admin-ajax.php

Any help is greatly appreciated

3 Answers 3

1

You should try selenium in this case if you want to scrape javascript content:

from selenium import webdriver
import time

driver = webdriver.PhantomJS()
driver.get("http://turnpikeshoes.com/shop/TCF00003")
time.sleep(5)

LongDsc = driver.find_element_by_class_name("productLongDescription").text

print 'LongDsc:', LongDsc

Btw, you should also install PhantomJS as headless browser.

Sign up to request clarification or add additional context in comments.

Comments

0

The shop performs a POST request to itself during loading with the URL as parameter in the form data. Here is a little script using requests.Session which first opens the shop and then sends the POST request to get the product information. It simulates the steps a browser would do - saving the cookies from the first request - which is probably needed to get the desired response from the POST call.

import requests

product_url = 'http://turnpikeshoes.com/shop/TCF00003'
product_ajax = 'http://turnpikeshoes.com/wp-admin/admin-ajax.php'
data = {'mrQueries[]':'section', 'action':'mrshop_ajax_call', 'url': product_url}

s = requests.Session()
s.get(product_url)
r = s.post(product_ajax, data=data)

print(r.text)

You could also try to get the cookies at the beginning once and using them for all further POST requests. The shops afford to enable scripting to view the product plays in your hands by reducing the actual response size.

Comments

0

This one worked for me.

from selenium import webdriver

ff = webdriver.PhantomJS()
ff.get(url)
ff.find_element_by_xpath("//span[@itemprop='price']").get_attribute("content")

Thanks!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.