Trouble crawling/scraping webpages that use javascript with Perl

Question

I've been trying to teach myself how to crawl and scrape different websites. I got a good feeling about crawling/scraping, but only with websites which mainly use HTML. Now I'm working with this link https://intel.taleo.net/careersection/10000/jobsearch.ftl

I'm using Perl (with mechanize) to do the following task : I want to write a crawler/scraper to click the "United States" checkbox on the left (filtering the results) and then collect the titles of all jobs. However, I couldn't find a way to navigate to this radio button using Perl. Can someone get me started on this? (an example code would be helpful).

Have you considered using a headless browser like PhantomJS? It's more setup but it supports full Javascript. Then you could hook into the events of the page and execute JS code once the page has loaded/form is displayed/results are fetched. — kba
– kba, Commented Feb 11, 2016 at 10:00

user3019319 · Accepted Answer · 2016-02-11 13:09:02Z

3

you need to analyise the page and see how this radio button impelented in order to use WWW-Mechanize to eumulate the Javascript code if there JavaScript code there .

also on Perl you have more easy options to handle JavaScript below some of crawling modules that handle javascript out of the box :

1.WWW-Mechanize-Firefox which automate FireFox 
2.WWW-Mechanize-PhantomJS which based on PhatonJS Broweser and can handle javascript
3.WWW::Selenium which use Selenium 
4.WWW::HtmlUnit  which based on Java HtmlUnit and can handle javascript

answered Feb 11, 2016 at 13:09

user3019319

3463 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Trouble crawling/scraping webpages that use javascript with Perl

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related