-2

I am trying to write a block of code that does this: it first extracts text from a pdf and then creates a text file with the content in it. This is what I wrote:

import os
import pyPdf
import re

##function that extracts text from pdf
def pdfcontent(filename):
    ct = ""
    pdf = pyPdf.PdfFileReader(file(filename,"rb"))
    for i in range(0,pdf.getNumPages()):
        ct += pdf.getPage(i).extractText() + "\n"
    return ct

##funcion that generates a txt file from a pdf
def pdftotxt(filename):
    ##first, convert pdf to txt
    pdfct = pdfcontent(filename)
    ##fix filename problem
    newfn = re.sub(".pdf", "", filename)
    #now generate txt
    fo = open(r'C:\Users\xxx\PycharmProjects\untitled\decisiontxt\' + newfn + ".txt","wb")
    fo.write(pdfct)
    fo.close()

pdftotxt("PDFfromDocumentum.pdf")

EDIT: I fixed my previous problems and then another problem came up:

File "C:/Users/xxx/PycharmProjects/untitled/fdsa", line 22
fo = open(r'C:\Users\xxx\PycharmProjects\untitled\decisiontxt\' + newfn + ".txt","wb")
                                                                                      ^
SyntaxError: EOL while scanning string literal

It seems to me that Python took

fo = open(r'C:\Users\xxx\PycharmProjects\untitled\decisiontxt\' + newfn + ".txt","wb")

as a string instead of a command. What's the solution to this problem?

4
  • 1
    Which file/directory doesn't exist? Are you sure it's not the filename you feed to PdfFileReader? Please post the actual traceback. Commented Jul 15, 2014 at 19:32
  • 1
    Note that a newline is denoted by "\n", not "/n". Commented Jul 15, 2014 at 19:33
  • 2
    Seems you are able to solve your problems within not very long time frame. That's very good, and good luck, but people on the internet are probably not interested in a live report of your programming struggle. Please consider posting a question when you are really stuck (and unable to find the solution on SO), instead of editing it every few minutes with your latest achievements... Commented Jul 15, 2014 at 19:40
  • Duplicate of stackoverflow.com/questions/2870730/… Commented Jul 15, 2014 at 19:46

2 Answers 2

0

If you want your script to create a new file if it does not exist use "wb" as the mode.

Refer to this for more information on using file modes.

EDIT ( Based on your edit )

The reason why you are getting EOL while parsing is that you are escaping the closing aphostrophe \' . Use backslash to escape the backslash preceding the apostrophe. I.E \\'

Sign up to request clarification or add additional context in comments.

Comments

0

Despite you're using raw string you should escape last \

open(r'C:\Users\xxx\PycharmProjects\untitled\decisiontxt\\' + newfn + ".txt","wb")

see Python raw strings and trailing backslash for details

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.