Unable to get page source code in python

Question

I'm trying to get the source code of a page by using:

import urllib2
url="http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560"
page =urllib2.urlopen(url)
data=page.read()
print data

and also by using a user_agent(headers) I did not succeed to get the source code of the page!

Have you guys any ideas what can be done? Thanks in Advance

What your getting is not the complete source code! try to open the page you will see the difference — user2546923
– user2546923, Commented Jul 3, 2013 at 15:08
it seems that is there a hidden input in the page <input name="t:ac" type="hidden" value=... — user2546923
– user2546923, Commented Jul 3, 2013 at 15:12

Martin Maillard · Accepted Answer · 2016-10-29 08:18:22Z

11

I tried it and the requests works, but the content that you receive says that your browser must accept cookies (in french). You could probably get around that with urllib2, but I think the easiest way would be to use the requests lib (if you don't mind having an additional dependency).

To install requests:

pip install requests

And then in your script:

import requests

url = 'http://france.meteofrance.com/france/meteo?PREVISIONS_PORTLET.path=previsionsville/750560'

response = requests.get(url)
print(response.content)

I'm pretty sure the source code of the page will be what you expect then.

edited Oct 29, 2016 at 8:18

answered Jul 3, 2013 at 15:47

Martin Maillard

2,83121 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 12:00:35Z

2

requests library worked for me as Martin Maillard showed.

Also in another thread I have noticed this note by leoluk here:

Edit: It's 2014 now, and most of the important libraries have been ported and you should definitely use Python 3 if you can. python-requests is a very nice high-level library which is easier to use than urllib2.

So I wrote this get_page procedure:

import requests
def get_page (website_url):
    response = requests.get(website_url)
    return response.content

print get_page('http://example.com')

Cheers!

edited May 23, 2017 at 12:00

CommunityBot

11 silver badge

answered Jan 11, 2015 at 16:04

Sergeus

413 bronze badges

Comments

Ibrahim Awad · Accepted Answer · 2013-07-03 17:00:20Z

0

I tried a lot of things, "urllib" "urllib2" and many other things, but one thing worked for me for everything I needed and solved any problem I faced. It was Mechanize .This library simulates using a real browser, so it handles a lot of issues in that area.

answered Jul 3, 2013 at 17:00

Ibrahim Awad

5781 gold badge7 silver badges13 bronze badges

Collectives™ on Stack Overflow

Unable to get page source code in python

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related