1

I am attempting to scrape a website which has a "Show More" link at the bottom of the page that leads to more data to scrape. Here is a link to the website page: https://untappd.com/v/total-wine-more/47792. Here is my full code:

class Untap(scrapy.Spider):
name = "Untappd"
allowed_domains = ["untappd.com"]
start_urls = [
    'https://untappd.com/v/total-wine-more/47792' #URL: Major liquor store chain with Towson location.
]

def parse(self, response):
    for beer_details in response.css('div.beer-details'):
        yield {
            'name': beer_details.css('h5 a::text').getall(), #Name of Beer
            'type': beer_details.css('h5 em::text').getall(), #Style of Beer
            'ABVIBUs': beer_details.css('h6 span::text').getall(), #ABV and IBU of Beer
            'Brewery': beer_details.css('h6 span a::text').getall() #Brewery that produced Beer  
        }
    load_more = response.css('a.yellow button more show-more-section track-click::attr(href)').get()
    if load_more is not None:
        load_more = response.urljoin(load_more)
        yield scrapy.Request(load_more, callback=self.parse)

I've attempted to use the bottom "load_more" block to continue loading more data for scraping, but no inputs with the HTML from the website have been working.

Here is the HTML from the website.

<a href="javascript:void(0);" class="yellow button more show-more-section track-click" data-track="venue" data-href=":moremenu" data-section-id="140216931" data-venue-id="47792" data-menu-id="38988361">Show More Beers</a>

I want to have the spider scrape what is show on the website, then click the link and continue scraping the page. Any help would be greatly appreciated.

1 Answer 1

2

Short answer:

curl 'https://untappd.com/venue/more_menu/47792/15?section_id=140248357' -H 'x-requested-with: XMLHttpRequest'

Clicking on that button executes javascript, so you'd need to use selenium to automate that, but fortunately, you won't :).

You can see, using Developer Tools, when you click that button it requests data following the pattern shown, increasing 15 each time (after /47792/), so first time: https://untappd.com/venue/more_menu/47792/15?section_id=140248357 second time: https://untappd.com/venue/more_menu/47792/30?section_id=140248357 then: https://untappd.com/venue/more_menu/47792/45?section_id=140248357' and so on.

But if you try to get it directly from the browser it gets no content, because they are expecting the 'x-requested-with: XMLHttpRequest' header, indicating it is an AJAX request.

Thus you have the URL pattern and the required header you need for coding your scraper.

The rest is to parse each response. :)

PD: probably the section_id parameter may change (mine is different from yours), but you already have the data-section-id="140248357" attribute in the button's HTML.

Sign up to request clarification or add additional context in comments.

2 Comments

Could you clarify how to parse each response using the URL pattern you provided?
Sorry, from the code you provided, I assumed you knew already how to do it... I'm now tired and need some rest. I'll spare some time later for updating my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.