6

trying to download some pdf files from the website using requests module, but I keep getting this error listed below. I saw several posts where they mentioned to use response.content for pdf files instead of response.text, but it's still generating error. Not sure how to fix this.

example link: https://corporate.exxonmobil.com/-/media/Global/Files/worldwide-giving/2018-Worldwide-Giving-Report.pdf

def scrape_website(link):
        
    try:
        print("getting content")
        cert = requests.certs.where()
        page = requests.get(link, verify=cert, headers={"User-Agent": "Mozilla/5.0 (X11; CrOS x86_64 12871.102.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.141 Safari/537.36"})
        
        print(page)
        if ".pdf" in link:
            print("the content is a pdf file. downloading..")
     

            return page.content
        
        return page.text

    except Exception as x:
        print(x)
        return ''

statement_page = scrape_website(link)


with open(filepath, 'w+', encoding="utf-8") as f: 
        print("writing page")
        f.write(statement_page)
        f.close()


    <ipython-input-42-1e4771d32073> in save_html_page(page, path, filename)
     13         with open(filepath, 'w+', encoding="utf-8") as f:
     14             print("writing page")
---> 15             f.write(page)
     16             f.close()
     17 

TypeError: write() argument must be str, not bytes
3
  • 2
    Change 'w+' to 'wb' Commented Oct 2, 2020 at 21:57
  • You read into statement_page but then try to write page to file Commented Oct 2, 2020 at 22:17
  • @RandomDavis, I had tried that as well but I still keep getting error - another one I get is a bytes-like object is required, not 'str' Commented Oct 3, 2020 at 1:57

2 Answers 2

6

sometimes i need to download things programatically too. I just use this:

import requests

response = requests.get("https://link_to_thing.pdf")
file = open("myfile.pdf", "wb")
file.write(response.content)
file.close()

you can also use the os package to download with wget:

import os

url = 'https://link_to_pdf.pdf'
name = 'myfile.pdf'

os.system('wget {} -O {}'.format(url,name))
Sign up to request clarification or add additional context in comments.

3 Comments

That's what I did, and I get this error: a bytes-like object is required, not 'str'
try using this instead: file = open("myfile.pdf", "r")
I would add context manager to the code: with open("myfile.pdf", "wb") as file: file.write(response.content)
3

Here is an example I used once, and it s pretty handy when you are trying to download large pdf file :

import requests
import sys

url = 'url'
filename = 'filename'
# creating a connection to the pdf
print("Creating the connection ...")
with requests.get(url, stream=True) as r:
    if r.status_code != 200:
        print("Could not download the file '{}'\nError Code : {}\nReason : {}\n\n".format(
            url, r.status_code, r.reason), file=sys.stderr)
    else:
        # Storing the file as a pdf
        print("Saving the pdf file  :\n\"{}\" ...".format(filename))
        with open(filename, 'wb') as f:
            try:
                total_size = int(r.headers['Content-length'])
                saved_size_pers = 0
                moversBy = 8192*100/total_size
                for chunk in r.iter_content(chunk_size=8192):
                    if chunk:
                        f.write(chunk)
                        saved_size_pers += moversBy
                        print("\r=>> %.2f%%" % (
                            saved_size_pers if saved_size_pers <= 100 else 100.0), end='')
                print(end='\n\n')
            except Exception:
                print("==> Couldn't save : {}\\".format(filename))
                f.flush()
                r.close()
        r.close()

This uses : iter_content() to download and then save the pdf chunk by chunck.

1 Comment

Although I didn't try this, but I love the concept.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.