I've pieced together a script which scrapes various pages of products on a product search page, and collects the title/price/link to the full description of the product. It was developed using a loop and adding a +i to each page (www.exmple.com/search/laptops?page=(1+i)) until a 200 error applied.
The product title contains the link to the actual products full description - I would now like to "visit" that link and do the main data scrape from within the full description of the product.
I have an array built for the links extracted from the product search page - I'm guessing running off this would be a good starting block.
How would I go about extracting the HTML from the links within the array (ie. visit the individual product page and take the actual product data and not just the summary from the products search page)?
Here are the current results I'm getting in CSV format:
Link Title Price
example.com/laptop/product1 laptop £400
example.com/laptop/product2 laptop £400
example.com/laptop/product3 laptop £400
example.com/laptop/product4 laptop £400
example.com/laptop/product5 laptop £400
wgetcan recursively scrape pages out-of-the-box - it might help to make sure it's okay with the website's owner.