How to search and replace text in an XML file using Python?

Question

How do I search an entire xml file for a specific text pattern and then replace each occurrence of that text with new text pattern in Python 3.5?

Everything else (format, attributes, comments, etc.) needs to remain as it is in the original xml file.

I am running Python 3.5.1 on Windows (win32).

Specifically, I would like to replace each occurrence of "FEATURE NAME" with "THIS WORKED" and replace each occurrence of "FEATURE NUMBER" with "12345".

I have been trying to learn Python and xml.etree.ElementTree but cannot figure this out. I already looked at "Search and replace a line in a .xml file in Python", "Search and replace a line in a file in Python", and "How to search and replace text in a file using Python?" and other existing Q/A's on this site but cannot figure this out - I'm not an experienced programmer, so please let me know if more input is needed . Your help is greatly appreciated!!!

Here is a copy of what the xml code looks like when I open it in Notepad (except I added spaces to indent each line and hit return for some lines when I pasted it into this question):

<description-topic>
    <access-info>
        <index-term-set>
            <index-term>
                <primary>FID FEATURE NUMBER</primary>
            </index-term>
            <index-term>
                <primary>FEATURE NAME</primary>
            </index-term>
            <index-term>
                <primary>Common features</primary>
                <secondary>FID FEATURE NUMBER</secondary>
            </index-term>
        </index-term-set>
    </access-info>
    <title>FEATURE NUMBER - FEATURE NAME</title>
    <block>
        <label>Platform</label>
        <comment>REVIEWERS: I guessed at the FEATURE NAME</comment>
        <para>
            This feature applies to the following platforms: FEATURE NAME<!--Check the values--></para>
    </block>
    <block branch="no">
        <label>Feature Benefits</label>
        <para>
            <comment>REVIEWERS: What do we put here? See template (link given in review email) for more information.</comment>
        </para>
    </block>
    <block branch="no">
        <label>Dependencies</label>
        <para/>
        <subblock>
            <label>Features</label>
            <comment>What FEATURE NAME do we put here?</comment>
        </subblock>
        <subblock>
            <label>Hardware</label>
            <comment>What FEATURE NAME do we put here?</comment>
            <para>This feature applies to the following: FEATURE NUMBER and text.</para><?Pub Caret -1?>
        </subblock>
        <subblock>
            <label>Dependencies outside the eNodeB</label>
            <comment>What FEATURE NAME do we put here?</comment>
        </subblock>
    </block>
    <block branch="no">
        <label>Impacts</label>
        <comment>REVIEWERS: What FEATURE NUMBER do we put here?</comment>
        <para>
            <comment/>
        </para>
    </block>
</description-topic>

Here is the latest code I am trying to get to work:

from xml.etree import ElementTree as et
tree = et.parse('Atemplate2.xml')
tree.find('description-topic/access-info/index-term-set/index-term/primary/').text = '12345'
tree.write('Atemplate2.xml')

I get the following error: Traceback (most recent call last): File "ajktest18.py", line 15, in tree.find('description-topic/access-info/index-term-set/index-term/primary/').text = '12345'

AttributeError: 'NoneType' object has no attribute 'text'

I would prefer to be able to search and modify any occurrences in the entire file, but I can't figure out how to get to even one specific occurrence of the text I am searching for.

Here is the code I tried to use to find the path:

import xml.etree.ElementTree as ET
tree = ET.parse('Atemplate.xml')
root = tree.getroot()

print(root.tag, root.attrib, root.text)

for child in root:
    print(child.tag, child.attrib, child.text)
for label in root.iter('label'):
    print(label.tag, label.attrib, label.text)
for title in root.iter('title'):
    print(title.attrib)

I also tried the following code:

with open('Atemplate2.xml') as f:
    tree = ET.parse(f)
    root = tree.getroot()

for elem in root.getiterator():
    try:
        elem.text = elem.text.replace('FEATURE NAME', 'THIS WORKED')
        elem.text = elem.text.replace('FEATURE NUMBER', '12345')
    except AttributeError:
        pass

tree.write('output.xml')

but that gives the following error:

File "<pyshell#40>", line 2, in <module>
    tree = ET.parse(f)
File "C:\MyPath\Python35-32\lib\xml\etree\ElementTree.py", line 1182, in parse
    tree.parse(source, parser)
File "C:\ MyPath \Python35-32\lib\xml\etree\ElementTree.py", line 594, in parse
    self._root = parser._parse_whole(source)
File "C:\ MyPath \Python35-32\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1119: character maps to

# #

FINAL UPDATE - Here is the code that worked for me in the end (thank u, Jarad!):

import lxml.etree as ET
#using lxml instead of xml preserved the comments

#adding the encoding when the file is opened and written is needed to avoid a charmap error
with open('filename.xml', encoding="utf8") as f:
  tree = ET.parse(f)
  root = tree.getroot()


  for elem in root.getiterator():
    try:
      elem.text = elem.text.replace('FEATURE NAME', 'THIS WORKED')
      elem.text = elem.text.replace('FEATURE NUMBER', '123456')
    except AttributeError:
      pass

#tree.write('output.xml', encoding="utf8")
# Adding the xml_declaration and method helped keep the header info at the top of the file.
tree.write('output.xml', xml_declaration=True, method='xml', encoding="utf8")

I am trying to learn how to do this and automate something. That is why I am here. I have Python training books and created programs and modified them. I can't figure out how to do this part. My point with my comment is that I might be missing something very easy and obvious...or just let me know if there is additional input that is needed to help answer my question. Thank you for anyone who can help. — AJK
– AJK, Commented Jun 16, 2016 at 20:41
Fair enough, then you should post the code you have written, and check how to ask. — spectras
– spectras, Commented Jun 16, 2016 at 21:23
@spectras - I posted more information including code and deleted the generic abc text reference but kept the more specific text. Thanks — AJK
– AJK, Commented Jun 17, 2016 at 21:14

Community · Accepted Answer · 2017-05-23 12:32:58Z

12

Caveats:

I have never worked with the xml.etree.ElementTree library
I have never worked with it because I never find myself manipulating XML
I don't know if this is the "best" way compared to someone that knows the library in and out
Commentors seem set on judging you instead of helping you out

This is a modification from this excellent answer. The thing is, you need to read the XML file in and parse it.

import xml.etree.ElementTree as ET

with open('xmlfile.xml', encoding='latin-1') as f:
  tree = ET.parse(f)
  root = tree.getroot()

  for elem in root.getiterator():
    try:
      elem.text = elem.text.replace('FEATURE NAME', 'THIS WORKED')
      elem.text = elem.text.replace('FEATURE NUMBER', '123456')
    except AttributeError:
      pass

tree.write('output.xml', encoding='latin-1')

Note that you can change the encoding parameter to something else such as: utf-8, cp1252, ISO-8859-1, etc. Really depends on your system and file.

edited May 23, 2017 at 12:32

CommunityBot

11 silver badge

answered Jun 17, 2016 at 12:29

Jarad

19.2k20 gold badges105 silver badges165 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

AJK Over a year ago

Thank you. The "excellent answer" post you provided looks just like what I am trying to do.

AJK Over a year ago

I tried the code you provided and get the following error: Traceback (most recent call last): File "<pyshell#40>", line 2, in <module> tree = ET.parse(f) File "C:\MyPath\Python35-32\lib\xml\etree\ElementTree.py", line 1182, in parse tree.parse(source, parser) File "C:\ MyPath \Python35-32\lib\xml\etree\ElementTree.py", line 594, in parse self._root = parser._parse_whole(source)

Jarad Over a year ago

It's hard to say without seeing your code. Can you append your post with an edit showing your new approach?

AJK Over a year ago

I edited my post and included my code for the example you provided. Thanks for your help.

Jarad Over a year ago

It seems to me like it might be an encoding issue. Change with open('xmlfile.xml') as f: to have an encoding parameter like this with open('xmlfile.xml', encoding='utf-8') as f:. If utf-8 doesn't work, try any of these: cp1252, latin-1. Also, when you write in tree.write('output.xml'), be sure to add the same encoding parameter so the output file preserves the encoding. I updated my answer.

|

user3349907 · Accepted Answer · 2022-04-22 06:18:29Z

0

Only this solution worked for me.

import lxml.etree as ET
tree = ET.parse("input.xml")
root = tree.getroot()
tree = root.getroottree()
for elem in root.getiterator():
    try:
        elem.text = elem.text.replace('abc', 'xyz')
    except Exception:
        pass
tree.write('output.xml', xml_declaration=True, method='xml', encoding="utf-16")

answered Apr 22, 2022 at 6:18

user3349907

3573 silver badges5 bronze badges

Comments

Zayn Patel · Accepted Answer · 2023-09-29 01:05:59Z

0

Just a note: Python deprecated the .getiterator() a few versions ago so you'll likely see an AttributeError if you use it now.

.iter() is the correct method to call with the xml code referenced in above snippets.

edited Sep 29, 2023 at 1:05

answered Sep 29, 2023 at 1:05

Zayn Patel

314 bronze badges

Collectives™ on Stack Overflow

How to search and replace text in an XML file using Python?

3 Answers 3

8 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

8 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related