I have a PDF full of quotes:
https://www.pdf-archive.com/2017/03/22/test/
I can extract the text in python using the following code:
import PyPDF2
pdfFileObj = open('example.pdf','rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
pageObj = pdfReader.getPage(0)
print (pageObj.extractText())
This returns all the quotes as one paragraph. Is it possible to 'split' the pdf by the horizontal separator and split it into quotes that way?