Downloading a pdf based webpage as pdf using Python

Question

Here is given a method to download a webpage as pdf, which works.

However, the website I am interested in is also displaying a pdf itself, so this method does not work. For example, this page. Is there anything specific for such url's?

When I use the post I shared above, I get the following error:

OSError: wkhtmltopdf reported an error:
Loading pages (1/6)
Error: Failed loading page http://curia.europa.eu/juris/showPdf.jsf;jsessionid=CAE85693A88870E357F61ED4344FD7E9?text=&docid=62809&pageIndex=0&doclang=EN&mode=lst&dir=&occ=first&part=1&cid=2878455 (sometimes it will work just to ignore this error with --load-error-handling ignore)
Exit with code 1, due to unknown error.

Matt VanEseltine · Accepted Answer · 2019-04-22 00:56:07Z

3

A more-or-less basic use of the requests package will help you out here. (This is only slightly fancy with chunking the result.)

import requests
outpath = './out.pdf'
url = r"""http://curia.europa.eu/juris/showPdf.jsf;jsessionid=03B8AD93D8D1B1FBB33A15FDA3774709?text=&docid=62809&pageIndex=0&doclang=EN&mode=lst&dir=&occ=first&part=1&cid=2874259"""
r = requests.get(url, stream=True)
if r.status_code == 200:
    with open(outpath, 'wb') as f:
        for chunk in r.iter_content(1024):
            f.write(chunk)

For more fun with requests, see: https://2.python-requests.org//en/master/

answered Apr 22, 2019 at 0:56

Matt VanEseltine

1482 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

independentvariable Over a year ago

Yes, and it works really fast! I iterated over 200 pages like in a minute. Thanks!

Collectives™ on Stack Overflow

Downloading a pdf based webpage as pdf using Python

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related