1

I am trying to convert a PDF into JPEG using Python. Below are the steps I have taken as well as the code but, firstly, here are:

  1. Expected results: Have 1 JPEG file per page in the PDF file added into my "Output" folder.
  2. Actual results: The code appears to run indefinitely without any JPEGS being added to the "Output" folder.

Steps taken:

  • Installed pdf2image via CMD (pip install pdf2image)
  • Installed Poppler.

Note on Poppler: It is required to add it to PATH and I had done this in the environment variables but I kept getting the error pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is Poppler installed and in PATH?. So as a workaround, I have added the path in the code directly and I am not receiving this error any longer.

from pdf2image import convert_from_path

path = "D:/Users/<USERNAME>/Desktop/Python/DeratingTool/"
pdfname = path+"<PDFNAME>.pdf"
images = convert_from_path(pdfname, 500,poppler_path=r'C:\Program Files\Release-22.04.0-0\poppler-22.04.0\Library\bin')
output_folder_path = "D:/Users/<USERNAME>/Desktop/Python/DeratingTool/Output"
i = 1 

for image in images: 
    image.save(output_folder_path + str(i) + "jpg", "JPEG")
    i = i+1

Any ideas why this doesn't seem to be able to finish would be most welcome.

Thank you.

4
  • 1
    Does it write the images if you define an output_folder? Eg images = convert_from_path(pdfname, 500,poppler_path=r'C:\Program Files\Release-22.04.0-0\poppler-22.04.0\Library\bin', output_folder="D:/Users/<USERNAME>/Desktop/Python/DeratingTool/Output") Commented May 30, 2022 at 19:51
  • Have you tried reading a single PDF and writing an image to the same location? Before you try mixing in the complexity of loops. Commented May 30, 2022 at 22:45
  • @RJAdriaansen: I actually found that in the definitions right after posting my question and just tried it now... it absolutely did the trick or almost. It did generate 81 files (which is normal since the PDF has 81 pages) BUT it created them in PPM and not JPEG. Meaning they're not usable as they are and huge (approx 125Mb per pic). Thanks for helping with the first part! Commented Jun 1, 2022 at 8:56
  • 1
    @ZachYoung: Admittedly, no, I didn't since the loop seemed rather straightforward. Seems the issue was with the output folder needing to be included in the parameters. Commented Jun 1, 2022 at 8:57

1 Answer 1

3

I actually found all of the information I needed for the desired result right in the definitions (Thank you, @RJAdriaansen for pointing me back there). The default format is set to "PPM" and can be changed to "jpeg" Below is the functioning code for my purposes:

from pdf2image import convert_from_path

path = "D:/Users/<USERNAME>/Desktop/Python/DeratingTool/"

pdfname = path+"<FILENAME>.pdf"

images = convert_from_path( 
pdfname, 
dpi=500, 
poppler_path=r'C:\Program Files\Release-22.04.0-0\poppler-22.04.0\Library\bin', 
output_folder="D:/Users/<USERNAME>/Desktop/Python/DeratingTool/Output", 
fmt="jpeg", 
jpegopt=None)

Thank you

Sign up to request clarification or add additional context in comments.

1 Comment

Ironically ... I ended up here after GitHub CoPilot generated incorrect code :) Passing the error and it gave me substantially incorrect code. So I did what I always do - lookup the docs and hit SO and here's your answer! Upvoted. I'd be happy if we had AI, but it's neither.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.