1

I have to replace multiple patterns and the piece of code is same for all. But, when I included it in the same .py file, all the patterns added aren't parsed and replaced. Is there a way to achieve this without having to create multiple .py files. Two samples added below, but I have nearly 7-8 conditions like this. I am using Python 3. Can anyone help please?

import glob
for filepath in glob.iglob('C:/Users/sh001/Desktop/tag/**/*.xml', recursive=True):
        with open(filepath) as file:
           s = file.read()
        s = s.replace('</em>', '</i>')
        with open(filepath, "w") as file:
             file.write(s)

import glob
for filepath in glob.iglob('C:/Users/sh001/Desktop/tag/**/*.xml', recursive=True):
        with open(filepath) as file:
           s = file.read()
        s = s.replace('</em>', '</i>')
        with open(filepath, "w") as file:
             file.write(s)
1
  • In addition, I also get UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1282: character maps to <undefined> error. Including encoding="utf-8" did not help. Commented May 8, 2020 at 12:34

1 Answer 1

1

What you can do, is to remove the "replacing" logic and put that inside of a function where you can give it the strings you want to change.

import glob

def replacer(filepath, to_replace, value):
    with open(filepath, "r", encoding="utf-8") as file:
        s=file.read()
    s=s.replace(to_replace, value)
    with open(filepath, "w", encoding="utf-8") as file:
        file.write(s)


for filepath in glob.iglob("./testFolder/*.xml", recursive=True):
    replacer(filepath=filepath, to_replace="<em>", value="<i>")
    replacer(filepath=filepath, to_replace="</em>", value="</i>")
    replacer(filepath=filepath, to_replace="<h1>", value="<h2>")
    replacer(filepath=filepath, to_replace="</h1>", value="</h2>")

While I loop over the files, I call all the replacements one after another, so each file has the same "replacements" done to them.

I have two files in the folder testFolder, which are test1.xml:

<em>
    Testabc
</em>

test2.xml:

<em>
    ”Rock'n'Roll”
</em>
<em>
    totally a new tag!    
</em>
<h1>
    headers are hard!
</h1>

See that test2.xml contains above? That is the character that charmap can't find a translation for. Inside of replacer I set the encoding for both reading and writing to utf-8 and it stops it from raising an error.

Sign up to request clarification or add additional context in comments.

1 Comment

Just tested the code you shared and it worked beautifully. Thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.