I've done web scraping before but it was never this complex. I want to grab course information from a school website. However all the course information is displayed in a web scraper's nightmare.
First off, when you click the "Schedule of Classes" url, it directs you through several other pages first (I believe to set cookies and check other crap).
Then it finally loads a page with an iframe that apparently only likes to load when it's loaded from within the institution's webpage (ie arizona.edu).
From there the form submissions have to be made via buttons that don't actually reload the page but merely submit a AJAX query and I think it just manipulates the iframe.
This query is particularly hard for me to replicate. I've been using PHP and curl to simulate a browser visiting the initial page, gather's the proper cookies and such. But I think I have a problem with the headers that my curl function is sending because it never lets me execute any sort of query after the initial "search form" loads.
Any help would be awesome...
http://www.arizona.edu/students/registering-classes -> "Schedule of Classes"
Or just here: http://schedule.arizona.edu/