Scrapy Load More Issue - CSS Selector

Question

I am attempting to scrape a website which has a "Show More" link at the bottom of the page that leads to more data to scrape. Here is a link to the website page: https://untappd.com/v/total-wine-more/47792. Here is my full code:

class Untap(scrapy.Spider):
name = "Untappd"
allowed_domains = ["untappd.com"]
start_urls = [
    'https://untappd.com/v/total-wine-more/47792' #URL: Major liquor store chain with Towson location.
]

def parse(self, response):
    for beer_details in response.css('div.beer-details'):
        yield {
            'name': beer_details.css('h5 a::text').getall(), #Name of Beer
            'type': beer_details.css('h5 em::text').getall(), #Style of Beer
            'ABVIBUs': beer_details.css('h6 span::text').getall(), #ABV and IBU of Beer
            'Brewery': beer_details.css('h6 span a::text').getall() #Brewery that produced Beer  
        }
    load_more = response.css('a.yellow button more show-more-section track-click::attr(href)').get()
    if load_more is not None:
        load_more = response.urljoin(load_more)
        yield scrapy.Request(load_more, callback=self.parse)

I've attempted to use the bottom "load_more" block to continue loading more data for scraping, but no inputs with the HTML from the website have been working.

Here is the HTML from the website.

<a href="javascript:void(0);" class="yellow button more show-more-section track-click" data-track="venue" data-href=":moremenu" data-section-id="140216931" data-venue-id="47792" data-menu-id="38988361">Show More Beers</a>

I want to have the spider scrape what is show on the website, then click the link and continue scraping the page. Any help would be greatly appreciated.

wg1k · Accepted Answer · 2020-05-02 05:24:01Z

2

Short answer:

curl 'https://untappd.com/venue/more_menu/47792/15?section_id=140248357' -H 'x-requested-with: XMLHttpRequest'

Clicking on that button executes javascript, so you'd need to use selenium to automate that, but fortunately, you won't :).

You can see, using Developer Tools, when you click that button it requests data following the pattern shown, increasing 15 each time (after /47792/), so first time: https://untappd.com/venue/more_menu/47792/15?section_id=140248357 second time: https://untappd.com/venue/more_menu/47792/30?section_id=140248357 then: https://untappd.com/venue/more_menu/47792/45?section_id=140248357' and so on.

But if you try to get it directly from the browser it gets no content, because they are expecting the 'x-requested-with: XMLHttpRequest' header, indicating it is an AJAX request.

Thus you have the URL pattern and the required header you need for coding your scraper.

The rest is to parse each response. :)

PD: probably the section_id parameter may change (mine is different from yours), but you already have the data-section-id="140248357" attribute in the button's HTML.

edited May 2, 2020 at 5:24

answered May 2, 2020 at 5:07

wg1k

1,59614 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Jacob P. Over a year ago

Could you clarify how to parse each response using the URL pattern you provided?

wg1k Over a year ago

Sorry, from the code you provided, I assumed you knew already how to do it... I'm now tired and need some rest. I'll spare some time later for updating my answer.

Collectives™ on Stack Overflow

Scrapy Load More Issue - CSS Selector

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related