0

I am using python to convert a rtf file to plain text. I am using pyth to convert it, but it ends up in a format that I don't recognize.

Here is my input python script:

from pyth.plugins.rtf15.reader import Rtf15Reader
from pyth.plugins.plaintext.writer import PlaintextWriter
import sys

if len(sys.argv) > 1:
    filename = sys.argv[1]
else:
    filename = "C:\localdata\logbook.rtf"

doc = Rtf15Reader.read(open(filename, "rb"))

y = [x.content for x in doc.content]
for j in y:
    print j

Here is what the ouput looks like:

[Text('[AJAJ]' {})]
[Text('[07:30 - Setup IP address]' {})]
[Text('[copied DM Queue and recipies from AYT404]' {})]
[Text('[07:50 - Backed up system pre SP7]' {})]
[Text('[08:00 - Installing SP7]' {})]
[Text('[08:15 - Startup Drivers -> OK]' {})]

Does anyone know what format this is and how can I convert this to something more readable?

9
  • This is perfectly readable for me. What exactly did you expect the output to look like? Commented Oct 9, 2015 at 0:56
  • 2
    After reading the doc, try: print PlaintextWriter.write(doc).getvalue() Commented Oct 9, 2015 at 0:58
  • I am trying to parse only the text out of the rtf file to reduce the filesize, when i use the PlaintextWriter it seems to convert the images to a lot of unreadable data and does not reduce the size. Commented Oct 9, 2015 at 1:00
  • @WorldSEnder You are correct it is readable, but I want the [Text('[AJAJ]' {})] to be turned into simply AJAJ Commented Oct 9, 2015 at 1:01
  • 1
    Why not just write a quick filter to get what you want? From the sample you've posted, it looks like you just have to drop the first 8 characters and the last 7 characters (if I've counted correctly). So instead of print j you'd have print j[8:-7] Commented Oct 9, 2015 at 1:09

1 Answer 1

1

It might be easier just to write a simple filter to get what you want. From the sample you've posted, it looks like you just have to drop the first 8 characters and the last 7 characters (if I've counted correctly). So instead of print j you'd have

print str(j)[8:-7]

The reason you need the str is that the object is the list y are apparently not strings. I'm not familiar with these modules, so I can't say what kind of objects they are, but their string representation is what we see printed. (Every python object has a string representation of some sort, which is why you can call print on anything.) So, whatever kind of object j actually is, str(j) is its string representation, and we can slice it.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.