How to download PDF file from web using python requests library

Question

trying to download some pdf files from the website using requests module, but I keep getting this error listed below. I saw several posts where they mentioned to use response.content for pdf files instead of response.text, but it's still generating error. Not sure how to fix this.

example link: https://corporate.exxonmobil.com/-/media/Global/Files/worldwide-giving/2018-Worldwide-Giving-Report.pdf

def scrape_website(link):
        
    try:
        print("getting content")
        cert = requests.certs.where()
        page = requests.get(link, verify=cert, headers={"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"})
        
        print(page)
        if ".pdf" in link:
            print("the content is a pdf file. downloading..")
     

            return page.content
        
        return page.text

    except Exception as x:
        print(x)
        return ''

statement_page = scrape_website(link)


with open(filepath, 'w+', encoding="utf-8") as f: 
        print("writing page")
        f.write(statement_page)
        f.close()


    <ipython-input-42-1e4771d32073> in save_html_page(page, path, filename)
     13         with open(filepath, 'w+', encoding="utf-8") as f:
     14             print("writing page")
---> 15             f.write(page)
     16             f.close()
     17 

TypeError: write() argument must be str, not bytes

You read into statement_page but then try to write page to file — Grismar
– Grismar, Commented Oct 2, 2020 at 22:17
@RandomDavis, I had tried that as well but I still keep getting error - another one I get is a bytes-like object is required, not 'str' — Kuni
– Kuni, Commented Oct 3, 2020 at 1:57

score 6 · Accepted Answer · 2021-02-09 13:49:03Z

6

sometimes i need to download things programatically too. I just use this:

import requests

response = requests.get("https://link_to_thing.pdf")
file = open("myfile.pdf", "wb")
file.write(response.content)
file.close()

you can also use the os package to download with wget:

import os

url = 'https://link_to_pdf.pdf'
name = 'myfile.pdf'

os.system('wget {} -O {}'.format(url,name))

edited Feb 9, 2021 at 13:49

answered Oct 2, 2020 at 22:20

user13372194

Sign up to request clarification or add additional context in comments.

3 Comments

Kuni Over a year ago

That's what I did, and I get this error: a bytes-like object is required, not 'str'

user13372194 Over a year ago

try using this instead: file = open("myfile.pdf", "r")

Vinh Over a year ago

I would add context manager to the code: with open("myfile.pdf", "wb") as file: file.write(response.content)

Pixel_teK · Accepted Answer · 2020-10-02 22:38:37Z

Here is an example I used once, and it s pretty handy when you are trying to download large pdf file :

import requests
import sys

url = 'url'
filename = 'filename'
# creating a connection to the pdf
print("Creating the connection ...")
with requests.get(url, stream=True) as r:
    if r.status_code != 200:
        print("Could not download the file '{}'\nError Code : {}\nReason : {}\n\n".format(
            url, r.status_code, r.reason), file=sys.stderr)
    else:
        # Storing the file as a pdf
        print("Saving the pdf file  :\n\"{}\" ...".format(filename))
        with open(filename, 'wb') as f:
            try:
                total_size = int(r.headers['Content-length'])
                saved_size_pers = 0
                moversBy = 8192*100/total_size
                for chunk in r.iter_content(chunk_size=8192):
                    if chunk:
                        f.write(chunk)
                        saved_size_pers += moversBy
                        print("\r=>> %.2f%%" % (
                            saved_size_pers if saved_size_pers <= 100 else 100.0), end='')
                print(end='\n\n')
            except Exception:
                print("==> Couldn't save : {}\\".format(filename))
                f.flush()
                r.close()
        r.close()

This uses : iter_content() to download and then save the pdf chunk by chunck.

Collectives™ on Stack Overflow

How to download PDF file from web using python requests library

2 Answers 2

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related