python, string.replace() and \n

Question

(Edit: the script seems to work for others here trying to help. Is it because I'm running python 2.7? I'm really at a loss...)

I have a raw text file of a book I am trying to tag with pages.

Say the text file is:

some words on this line,
1
DOCUMENT TITLE some more words here too.
2
DOCUMENT TITLE and finally still more words.

I am trying to use python to modify the example text to read:

some words on this line,
</pg>
<pg n=2>some more words here too,
</pg>
<pg n=3>and finally still more words.

My strategy is to load the text file as a string. Build search-for and a replace-with strings corresponding to a list of numbers. Replace all instances in string, and write to a new file.

Here is the code I've written:

from sys import argv
script, input, output = argv

textin = open(input,'r')
bookstring = textin.read()
textin.close()

pages = []
x = 1
while x<400:
    pages.append(x)
    x = x + 1

pagedel = "DOCUMENT TITLE"

for i in pages:
    pgdel = "%d\n%s" % (i, pagedel)
    nplus = i + 1
    htmlpg = "</p>\n<p n=%d>" % nplus
    bookstring = bookstring.replace(pgdel, htmlpg)

textout = open(output, 'w')
textout.write(bookstring)
textout.close()

print "Updates to %s printed to %s" % (input, output)

The script runs without error, but it also makes no changes whatsoever to the input text. It simply reprints it character for character.

Does my mistake have to do with the hard return? \n? Any help greatly appreciated.

/Edited to include correction to the bookstring replace command, but still the problem persists. — user1893148
– user1893148, Commented Jul 4, 2013 at 3:11
hmm... if I run that script, it does write the changes in the output file. What exactly are you trying to do? I mean, it is working for me. — Oscar Mederos
– Oscar Mederos, Commented Jul 4, 2013 at 3:13
also, it should be textin.close(), otherwise you're not calling the function. The same for textout.close. — Oscar Mederos
– Oscar Mederos, Commented Jul 4, 2013 at 3:16
Thanks, now reflected in question. It still is not working for me. I'm using .txt files on a mac as input and output files. I tried the test example from my question, and it still simply copies the input text without edits to the output. — user1893148
– user1893148, Commented Jul 4, 2013 at 3:26
Try adding print bookstring to see. It's working for me, are you sure it isn't a problem with the given arguments? — kirbyfan64sos
– kirbyfan64sos, Commented Jul 4, 2013 at 3:34

TerryA · Accepted Answer · 2013-07-04 03:37:08Z

4

In python, strings are immutable, and thus replace returns the replaced output instead of replacing the string in place.

You must do:

bookstring = bookstring.replace(pgdel, htmlpg)

You've also forgot to call the function close(). See how you have textin.close? You have to call it with parentheses, like open:

textin.close()

Your code works for me, but I might just add some more tips:

Input is a built-in function, so perhaps try renaming that. Although it works normally, it might not for you.
When running the script, don't forget to put the .txt ending:
- $ python myscript.py file1.txt file2.txt
Make sure when testing your script to clear the contents of file2.

I hope these help!

edited Jul 4, 2013 at 3:37

answered Jul 4, 2013 at 2:55

TerryA

60.2k11 gold badges122 silver badges148 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user1893148 Over a year ago

That's a crucial error to fix, but still the same problem persists. Going to edit my question to include this edit. Thanks!

user1893148 Over a year ago

Called the closed functions, and edited my question to reflect as much. It's still not working.

TerryA Over a year ago

@user1893148 I have added more info

user1893148 Over a year ago

Thanks for your help. Crazy that no one can reproduce it.

kirbyfan64sos · Accepted Answer · 2013-07-04 04:12:32Z

0

Here's an entirely different approach that uses re(import the re module for this to work):

doctitle = False
newstr = ''
page = 1

for line in bookstring.splitlines():
    res = re.match('^\\d+', line)
    if doctitle:
        newstr += '<pg n=' + str(page) + '>' + re.sub('^DOCUMENT TITLE ', '', line)
        doctitle = False
 elif res:
     doctitle = True
     page += 1
    newstr += '\n</pg>\n'
 else:
    newstr += line

print newstr

Since no one knows what's going on, it's worth a try.

answered Jul 4, 2013 at 4:12

kirbyfan64sos

10.8k6 gold badges58 silver badges79 bronze badges

Collectives™ on Stack Overflow

python, string.replace() and \n

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related