I have a Python script, that converts PDF content to a string.
text = list();
#npages is number of pages in the PDF file.
for n in range(npages):
text[n] = os.system('pdftotext myfile.pdf -') #the "-" prints to stdout.
print(text)
However when I print text, this is the output (a PDF file with two pages):
{0: 0, 1: 0}
When running the script, I see the os.system output being sent to the command line:
text from myfile.pdf page 1
text from myfile.pdf page 2
How can I store the standard output from the pdftotext command in a list?
textwere a list, you will receive anIndexErrorwhen you try to access the non-existing elementtext[0]② at every iteration you are receiving the whole text of the PDF file, not just the text of an individual page. Very sloppy question.