I'm trying to scrape the links from the careers page on a college website, and I am getting this error.
urllib.error.HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop. The last 30x error message was: Moved Temporarily
I think this is because the site has a session cookie. After doing a bit of reading, there seems to be many ways to get around this (Requests, http.cookiejar, Selenium/PhantomJs), but I don't know how to incorporate these solutions into my scraping program.
This is my scraping program. It's written in Python 3.6 with BeautifulSoup4.
from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("https://jobs.fanshawec.ca/applicants/jsp/shared/search/SearchResults_css.jsp")
soup = BeautifulSoup(html, 'html.parser')
data = soup.select(".ft0 a")
ads = []
for i in data:
link = i.get('href')
ads.append(link)
for job in ads:
print(job)
print('')
When I clear the cookies in my browser and manually go to the page I'm trying to scrape (https://jobs.fanshawec.ca/applicants/jsp/shared/search/SearchResults_css.jsp), I'm taken to a different page. Once I have the cookie though, I can go directly to the SearchResults page that I want to scrape.
This is the cookie:
Any thoughts on how I can deal with this cookie?
