1
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://www.animeplus.tv/anime-show-list/")
content =(html.read())
soup = BeautifulSoup(content)
print(soup.prettify())

The script works fine with other webpages, but I run the program for my targeted website I get.

<meta .$_server["request_uri"]."'"="" content="0;URL='" http-equiv="refresh"/>

I do not really understand the html code.

I assume it's some sort of redirect or way to prevent web scraping.

Is there a way for python to access the code after the redirect or in a way the browser would return the source code?

Thank you!

2
  • 1
    Looks like the source page borked their PHP code Commented Jun 28, 2014 at 2:50
  • Getting the page via curl also returns the same response -- I tried following redirects/changing the user agents, but no luck :( Commented Jun 28, 2014 at 4:11

1 Answer 1

2

The trick here is that the page redirects to itself and sets the Cookie header which is important, without it you would not get the HTML you see in the browser.

Here's the solution using requests - opening up the same page in the same session:

import requests
from bs4 import BeautifulSoup

url = "http://www.animeplus.tv/anime-show-list/"
session = requests.session()
session.get(url)
response = session.get(url)  # open up the page second time
soup = BeautifulSoup(response.content)
print(soup.title.text)  # prints: "Watch Anime | Anime Online | Free Anime | English Anime | Watch Anime Online - AnimePlus.tv"

Alternatively, you can use mechanize, but it doesn't support python 3 at the moment. Here's how it works:

>>> import mechanize
>>> browser = mechanize.Browser()
>>> browser.open('http://www.animeplus.tv/anime-show-list/')
>>> print browser.response().read()
<!DOCTYPE html>
<html>
<head>
  <title>Watch Anime | Anime Online | Free Anime | English Anime | Watch Anime Online - AnimePlus.tv</title> 
...
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.