2

Trying to read a pdf file thats name may change, however I have a preliminary script that contains the file name. So I successfully save that file name to a variable however when I try to open a file using that variable I get an error: "ValueError: embedded null byte"

I have tried a couple solutions for example I attempted using this solution, However I receive the same error. I have identified a work around using glob, since I can predict the file name (I know there will always be one PDF) however if possible I want to try to avoid using this solution in case in the future we have multiple PDFs to handle.

This is what I have:

pdfFileName = pdfFileName[132:220] # File path is correct, I have confirmed
objectPDF = open(pdfFileName,'rb')
pdfReader = PyPDF2.PdfFileReader(objectPDF)
pageObj = pdfReader.getPage(0)
print(pageObj.extractText())

My Error is:

Traceback (most recent call last):
  File "verify.py", line 48, in <module>
    objectPDF = open(pdfFileName,'rb')
ValueError: embedded null byte

What I would like is for the text of the pdf to be output to the console. The error is certainly with the way I'm reading the file, if I hard type the file path in it works as expected, but not when a variable is used with the exact same value as the string.

9
  • 1
    What error are you getting? Commented Jul 31, 2019 at 22:13
  • ValueError: embedded null byte Sorry that was in my question must of accidentally removed when editing. I'll put it in. @Axium Commented Jul 31, 2019 at 22:14
  • 1
    It seems that the file name has "nulls" in it. This article: lucumr.pocoo.org/2010/12/24/common-mistakes-as-web-developer can help fix it. I'm not posting an answer because I don't know exactly how to fix it. Commented Jul 31, 2019 at 22:23
  • 1
    Also, try using this before objectPDF = open(pdfFileName,'rb'): pdfFileName = pdfFileName.replace('\0',''). I'm not too sure if it'll work but it's worth a try. Commented Jul 31, 2019 at 22:26
  • 1
    I'll put it as an answer then. Commented Jul 31, 2019 at 22:30

1 Answer 1

2

Place this: pdfFileName = pdfFileName.replace('\0','') before this: objectPDF = open(pdfFileName,'rb')

What that code does is that it removes all "nulls` from the string, which allows everything to run properly.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.