0

I need to get the data from html but response.css, response.xpath and combination is not working whenever I tried to get the "regular-price" data it always says "none"

I need to get the value text of enter code here which $17.99

here's my code

HTML

<div class="price parbase"><div class="primary-row product-item-price product-item-price-discount"> <span class="price-value">$12.99</span><small class="js-price-value-original price-value-original">$17.99</small> </div> </div>

Scrapy python

def parse_subpage(self, response):
    item = {
    'title': response.css('h1.primary.product-item-headline::text').extract_first(),
    'sale-price': response.xpath("normalize-space(.//span[@class='price-value']/text())").extract_first(), 
    'regular-price': response.css('.js-price-value-original').xpath("@small").extract_first(),
    'photo-url': response.css('div.product-detail-main-image-container img::attr(src)').extract_first(),
    'description': response.css('p.pdp-description-text::text').extract_first()

        }   
    yield item

output should be regular-price: $17.99

please help thank you!

2

3 Answers 3

1

Your link gives me 404, but by your html snippet you need only response.css('small.js-price-value-original::text').get(), there is no attribute small.

UPD: Hm, seems this data is rendered by JS. Check html code of page and you will see huge json, search by whitePrice keyword. You can retrieve such data, forxample with response.xpath('//script[contains(text(), "whitePrice")]/text()').re_first("'whitePrice'\s?:\s?'([^']+)'")

Sign up to request clarification or add additional context in comments.

3 Comments

still not working, the output is 'regular-price': None, @vezunchik
www2.hm.com/en_us/productpage.0697992001.html try this one this now works still need to get the original prce @vezunchik
Brilliant use of regular expression here!
0

If this sniped is the only html you have, you can do:

def parse_subpage(self, response):
    item = {
    'title': response.css('h1.primary.product-item-headline::text').extract_first(),
    'sale-price': response.xpath("normalize-space(.//span[@class='price-value']/text())").extract_first(),
    'regular-price': response.xpath('//div/small[contains(@class, "js-price-value-original") and contains(@class, "price-value-original")]/text()').extract_first(),
    'photo-url': response.css('div.product-detail-main-image-container img::attr(src)').extract_first(),
    'description': response.css('p.pdp-description-text::text').extract_first()

        }   
    yield item

Btw, the website you provided shows a file not found

Comments

0

Thanks @vezunchik. If you want to use CSS selector. You can use the below code

response.css('script:contains("whitePrice")').re_first("'whitePrice'\s?:\s?'([^']+)'")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.