4

I'm trying to get the source code of a page by using:

import urllib2
url="http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560"
page =urllib2.urlopen(url)
data=page.read()
print data

and also by using a user_agent(headers) I did not succeed to get the source code of the page!

Have you guys any ideas what can be done? Thanks in Advance

4
  • 3
    What's the issue? That seems to work for me. Commented Jul 3, 2013 at 14:27
  • Works for me too. Is your internet on? Commented Jul 3, 2013 at 14:55
  • What your getting is not the complete source code! try to open the page you will see the difference Commented Jul 3, 2013 at 15:08
  • it seems that is there a hidden input in the page <input name="t:ac" type="hidden" value=... Commented Jul 3, 2013 at 15:12

3 Answers 3

11

I tried it and the requests works, but the content that you receive says that your browser must accept cookies (in french). You could probably get around that with urllib2, but I think the easiest way would be to use the requests lib (if you don't mind having an additional dependency).

To install requests:

pip install requests

And then in your script:

import requests

url = 'http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560'

response = requests.get(url)
print(response.content)

I'm pretty sure the source code of the page will be what you expect then.

Sign up to request clarification or add additional context in comments.

Comments

2

requests library worked for me as Martin Maillard showed.

Also in another thread I have noticed this note by leoluk here:

Edit: It's 2014 now, and most of the important libraries have been ported and you should definitely use Python 3 if you can. python-requests is a very nice high-level library which is easier to use than urllib2.

So I wrote this get_page procedure:

import requests
def get_page (website_url):
    response = requests.get(website_url)
    return response.content

print get_page('http://example.com')

Cheers!

Comments

0

I tried a lot of things, "urllib" "urllib2" and many other things, but one thing worked for me for everything I needed and solved any problem I faced. It was Mechanize .This library simulates using a real browser, so it handles a lot of issues in that area.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.