1

Here is given a method to download a webpage as pdf, which works.

However, the website I am interested in is also displaying a pdf itself, so this method does not work. For example, this page. Is there anything specific for such url's?

When I use the post I shared above, I get the following error:

OSError: wkhtmltopdf reported an error:
Loading pages (1/6)
Error: Failed loading page http://curia.europa.eu/juris/showPdf.jsf;jsessionid=CAE85693A88870E357F61ED4344FD7E9?text=&docid=62809&pageIndex=0&doclang=EN&mode=lst&dir=&occ=first&part=1&cid=2878455 (sometimes it will work just to ignore this error with --load-error-handling ignore)
Exit with code 1, due to unknown error.

1 Answer 1

3

A more-or-less basic use of the requests package will help you out here. (This is only slightly fancy with chunking the result.)

import requests
outpath = './out.pdf'
url = r"""http://curia.europa.eu/juris/showPdf.jsf;jsessionid=03B8AD93D8D1B1FBB33A15FDA3774709?text=&docid=62809&pageIndex=0&doclang=EN&mode=lst&dir=&occ=first&part=1&cid=2874259"""
r = requests.get(url, stream=True)
if r.status_code == 200:
    with open(outpath, 'wb') as f:
        for chunk in r.iter_content(1024):
            f.write(chunk)

For more fun with requests, see: https://2.python-requests.org//en/master/

Sign up to request clarification or add additional context in comments.

1 Comment

Yes, and it works really fast! I iterated over 200 pages like in a minute. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.