2

I'm new to Python scripting, so please forgive me in advance if the answer to this question seems inherently obvious.

I'm trying to put together a large-scale find-and-replace script using Python. I'm using code similar to the following:

infile = sys.argv[1]
charenc = sys.argv[2]
outFile=infile+'.output'

findreplace = [
('term1', 'term2'),
]

inF = open(infile,'rb')
s=unicode(inF.read(),charenc)
inF.close()

for couple in findreplace:
    outtext=s.replace(couple[0],couple[1])
    s=outtext

outF = open(outFile,'wb')
outF.write(outtext.encode('utf-8'))
outF.close()

How would I go about having the script do a find and replace for regular expressions?

Specifically, I want it to find some information (metadata) specified at the top of a text file. Eg:

Title: This is the title
Author: This is the author
Date: This is the date

and convert it into LaTeX format. Eg:

\title{This is the title}
\author{This is the author}
\date{This is the date}

Maybe I'm tackling this the wrong way. If there's a better way than regular expressions please let me know!

Thanks!

Update: Thanks for posting some example code in your answers! I can get it to work so long as I replace the findreplace action, but I can't get both to work. The problem now is I can't integrate it properly into the code I've got. How would I go about having the script do multiple actions on 'outtext' in the below snippet?

for couple in findreplace:
    outtext=s.replace(couple[0],couple[1])
    s=outtext
2
  • Thanks for the links. I've looked at re.sub() but haven't been able to work out how to plug it into my code. Commented Jun 14, 2010 at 11:20
  • your find-and-replace code has little to do with your actual task at hand. needed regex could not be reduces to a simple find-and-replace. Commented Jun 14, 2010 at 11:41

4 Answers 4

5
>>> import re
>>> s = """Title: This is the title
... Author: This is the author
... Date: This is the date"""
>>> p = re.compile(r'^(\w+):\s*(.+)$', re.M)
>>> print p.sub(r'\\\1{\2}', s)
\Title{This is the title}
\Author{This is the author}
\Date{This is the date}

To change the case, use a function as replace parameter:

def repl_cb(m):
    return "\\%s{%s}" %(m.group(1).lower(), m.group(2))

p = re.compile(r'^(\w+):\s*(.+)$', re.M)
print p.sub(repl_cb, s)

\title{This is the title}
\author{This is the author}
\date{This is the date}

Sign up to request clarification or add additional context in comments.

Comments

1

See re.sub()

Comments

0

The regular expression you want would probably be along the lines of this one:

^([^:]+): (.*)

and the replacement expression would be

\\\1{\2}

Comments

0
>>> import re
>>> m = 'title', 'author', 'date'
>>> s = """Title: This is the title
Author: This is the author
Date: This is the date"""
>>> for i in m:
    s = re.compile(i+': (.*)', re.I).sub(r'\\' + i + r'{\1}', s)


>>> print(s)
\title{This is the title}
\author{This is the author}
\date{This is the date}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.