TypeError: expected a character buffer object

Question

I have been trying to print the output to a new text file. But I get the error

TypeError: expected a character buffer object

What I'm trying to do is convert pdf to text and copy the text obtained to a new file.

import pyPdf

def getPDFContent():
  content = ""
  # Load PDF into pyPDF
  pdf = pyPdf.PdfFileReader(file("D:\output.pdf", "rb"))
  # Iterate pages
  for i in range(0, pdf.getNumPages()):
    # Extract text from page and add to content
    #content += pdf.getPage(i).extractText() + "\n"
    print pdf.getPage(i).extractText().encode("ascii", "ignore")

  # Collapse whitespace
  #content = " ".join(content.replace(u"\xa0", " ").strip().split())
  #return content

  #getPDFContent().encode("ascii", "ignore")
  getPDFContent()

  s =getPDFContent()
  with open('D:\pdftxt.txt', 'w') as pdftxt:
      pdftxt.write(s)

I did try to initialize s as str but then I get the error as "can't assign to function call".

Your getPDFContent() function doesn't return anything. print is not the same thing as return. — Martijn Pieters
– Martijn Pieters, Commented Jun 7, 2014 at 19:02
@Martijn plus I don't think there's meant to be a couple of recursive calls in there... So I'm guessing the indentation is not exactly reliable either — Jon Clements
– Jon Clements, Commented Jun 7, 2014 at 19:04
Your code sample is a bit of a mess. Can you clean it up (fix the indentation, remove obsolete comments, etc.). Include the actual attempt; I suspect the print version posted here is not your only version you tried. — Martijn Pieters
– Martijn Pieters, Commented Jun 7, 2014 at 19:04
I had even tried return before but the only thing i got was page 1,the rest of pages never appeared in my textfile. But print was the only 1 that worked where in the interpreter displayed the complete output but didn't copy it to a new text file. — Aaron Misquith
– Aaron Misquith, Commented Jun 7, 2014 at 19:18
possible duplicate of TypeError: expected a character buffer object - while trying to save integer to textfile — Florian Brucker
– Florian Brucker, Commented May 6, 2015 at 20:42

Padraic Cunningham · Accepted Answer · 2014-06-07 19:11:37Z

1

You are not returning anything getPDFContent() so basically you are writing None.

 result=[]
 for i in range(0, pdf.getNumPages()):
     result.append(pdf.getPage(i).extractText().encode("ascii", "ignore")) # store all in a list
 return result


 s = getPDFContent()
 with open('D:\pdftxt.txt', 'w') as pdftxt:
    pdftxt.writelines(s) # use writelines to write list content

How your code should look:

def getPDFContent():
    # Load PDF into pyPDF
    pdf = pyPdf.PdfFileReader(file("D:\output.pdf", "rb"))
    # Iterate pages
    result = []
    for i in range(0, pdf.getNumPages()):
        result.append(pdf.getPage(i).extractText().encode("ascii", "ignore"))
    return result

s = getPDFContent()
with open('D:\pdftxt.txt', 'w') as pdftxt:
    pdftxt.writelines(s)

edited Jun 7, 2014 at 19:11

answered Jun 7, 2014 at 19:03

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Martijn Pieters Over a year ago

The comments in the function suggest more was tried. But as it stands currently it is a mess.

Collectives™ on Stack Overflow

TypeError: expected a character buffer object

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related