46

I am looking to edit XML files using python. I want to find and replace keywords in the tags. In the past, a co-worker had set up template XML files and used a "find and replace" program to replace these key words. I want to use python to find and replace these key words with values. I have been teaching myself the Elementtree module, but I am having trouble trying to do a find and replace. I have attached a snid-bit of my XML file. You will seen some variables surrounded by % (ie %SITEDESCR%) These are the words I want to replace and then save the XML to a new file. Any help or suggestions would be great.

Thanks, Mike

<metadata>
<idinfo>
<citation>
<citeinfo>
 <origin>My Company</origin>
 <pubdate>05/04/2009</pubdate>
 <title>POLYGONS</title>
 <geoform>vector digital data</geoform>
 <onlink>\\C$\ArcGISDevelopment\Geodatabase\PDA_STD_05_25_2009.gdb</onlink>
</citeinfo>
</citation>
 <descript>
 <abstract>This dataset represents the mapped polygons developed from the field data for the %SITEDESCR%.</abstract>
 <purpose>This dataset was created to accompany some stuff.</purpose>
 </descript>
<timeperd>
<timeinfo>
<rngdates>
 <begdate>%begdate%</begdate>
 <begtime>unknown</begtime>
 <enddate>%enddate%</enddate>
 <endtime>unknown</endtime>
 </rngdates>
 </timeinfo>
 <current>ground condition</current>
 </timeperd>
3
  • 1
    +1 to xml parser. I like lxml. easy_install lxml Commented Jun 29, 2011 at 16:21
  • 9
    Thanks for the response. I am fairly new to python, brand new to XML and this is my first post on Stack. Sometimes asking the right question is more difficult then finding the right answer. Commented Jun 29, 2011 at 16:47
  • You'll also be wanting to familiarise yourself with xpath, it's a query language for XML. Don't worry, you can get plenty of help with it here on SO. Commented Jun 29, 2011 at 20:55

4 Answers 4

80

The basics:

from xml.etree import ElementTree as et
tree = et.parse(datafile)
tree.find('idinfo/timeperd/timeinfo/rngdates/begdate').text = '1/1/2011'
tree.find('idinfo/timeperd/timeinfo/rngdates/enddate').text = '1/1/2011'
tree.write(datafile)

You can shorten the path if the tag name is unique. This syntax finds the first node at any depth level in the tree.

tree.find('.//begdate').text = '1/1/2011'
tree.find('.//enddate').text = '1/1/2011'

Also, read the documentation, esp. the XPath support for locating nodes.

Sign up to request clarification or add additional context in comments.

1 Comment

Hey, thanks Mark. This is exactly what I was looking for. This works with my exsiting python program.
8

If you just want to replace the bits enclosed with %, then this isn't really an XML problem. You can easily do it with regex:

import re
xmlstring = open('myxmldocument.xml', 'r').read()
substitutions = {'SITEDESCR': 'myvalue', ...}
pattern = re.compile(r'%([^%]+)%')
xmlstring = re.sub(pattern, lambda m: substitutions[m.group(1)], xmlstring)

1 Comment

I tested this in a standalone script and this works well. I will add this to my python library for future reference. Thanks for responding.
0

To replace the placeholders all you need is to read the file line by line and replace:

for line in open(template_file_name,'r'):
  output_line = line
  output_line = string.replace(output_line, placeholder, value)
  print output_line 

4 Comments

This can be fragile - XML files are not just text. Whitespace is generally not important in XML, so a change to the input file can result in identical XML which your code doesn't recognise.
there is no whitespace in the %something% placeholders, I'd only add encode special XML characters like especially < and > if necessary
I suppose I just don't see how this solution would work when the value of the XML node is unknown.
As long as 'uknown' is a not taken for a valid placeholder, it will be left as it is. It's just a string.
0

You can modify in place and safely do so with xpath rather than full paths or worse, regex. See below and check out the docs on etree

from lxml import etree
raw = """
<node>
<begdate>%begdate%</begdate>
<begtime>unknown</begtime>
<enddate>%enddate%</enddate>
<endtime>unknown</endtime>
</node>"""
nodes = etree.fromstring(raw.strip())
shh = [setattr(x, "text", "DATE: 2021-01-01") for x in nodes.xpath(".//*[.='%begdate%']")]
nodes.xpath(".//begdate//text()")
['DATE: 2021-01-01']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.