How to edit multiple xml tag data for XML files in a specific directory using python

Question

i am trying to update a number of XML files in a specific directory.

I want to update the data between the tags for <author> Smith, Paul </author> and <price> 79.99 </price>. this needs to be applied for all the files within the same directory.

My code in python does make the update but inserts a new data of author and price, it should over write this data.

Any help will be much appreciated, see below xml and python coding.

See below is one of many XML files i am trying to update update:

<?xml version="1.0"?>
<catalog>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   </book>
</catalog>

Current Python code, trying to loop through all files in a specific directory:

import os
import re
my_dir = r'C:\Users\MyDocuments\Desktop\XML Test data1'

replace_what_author = '(?<=<author>)((((.)*,)*)(.)*)(?=</)'
replace_with_author = 'Bloggs, Joe'

replace_what_price = '(?<=<price>)(\d+\.\d\d(?!\d))(?=</)'
replace_with_price = '10.99' 

for fn in os.listdir(my_dir):
    #print(fn)
    pathfn = os.path.join(my_dir,fn)

    if os.path.isfile(pathfn):
        file = open(pathfn, 'r+')

        new_file_content=''


        for line in file:
            auth = re.compile(replace_what_author)
            pri = re.compile(replace_what_price) 

            new_file_content += auth.sub(replace_with_author,line)
            new_file_content += pri.sub(replace_with_price,line)       

        file.seek(0)
        file.truncate()
        file.write(new_file_content)
        file.close()

you should use the xml module and not regex for a better solution — gold_cy
– gold_cy, Commented Jan 4, 2020 at 19:49
Can you show the output xml after executing your code and what is the expected output? — amanb
– amanb, Commented Jan 4, 2020 at 20:04

gold_cy · Accepted Answer · 2020-01-05 12:40:21Z

1

You should use the built-in xml module to modify your XML files instead of regex. First on top declare this constant and import the module.

import xml.etree.ElementTree as ET

map_ = {'author': 'Bloggs, Joe', 'price': '10.99'}

Then after the following line

if os.path.isfile(pathfn):

I would change the logic to the following, so it would look like this

if os.path.isfile(pathfn):
    tree = ET.parse(pathfn)
    root = tree.getroot()

    for key in map_:
        for child in root.iter(key):
            child.text = map_[key]
    tree.write(pathfn, xml_declaration=True)

Then the file ends up looking like this

<catalog>
   <book id="bk103">
      <author>Bloggs, Joe</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>10.99</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   </book>
</catalog>

edited Jan 5, 2020 at 12:40

answered Jan 4, 2020 at 20:06

gold_cy

14.2k4 gold badges27 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Rio05UK Over a year ago

Hi @aws_apprentice, thank you that's really helpful and is alot more simpler. I have noticed one change to the XML file(s). Each XML file loses the following from the very top of the file <?xml version="1.0"?>. The file loses the XML tag is there another step i need to add? apologies if i am not making an sense and happy to clarify.

Rio05UK Over a year ago

see below what i have tried taking on your suggestion, see revised comment

Rio05UK Over a year ago

import xml.etree.ElementTree as ET import os  my_dir = r'C:\Users\riati\Desktop\01. Data Training\04. Python Spyder testing\XML Test data1'  map_ = {'author': 'islam, Riat',          'price': '14.99'}  for fn in os.listdir(my_dir):     #print(fn)     pathfn = os.path.join(my_dir,fn)               if os.path.isfile(pathfn):                  tree = ET.parse(pathfn)         root = tree.getroot()                  for key in map_:             for child in root.iter(key):                 child.text = map_[key]         tree.write(pathfn)

gold_cy Over a year ago

I updated the answer, we just need to add a flag to tree.write, so the line becomes tree.write(pathfn, xml_declaration=True) that should preserve the declaration up top

Parfait · Accepted Answer · 2020-01-06 17:55:54Z

0

Consider a parameterized XSLT script using Python's third-party module, lxml.

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" omit_xml_declaration="no"/>
  <xsl:strip-space elements="*"/>

  <!-- INITIALIZE PARAMETERS -->
  <xsl:param name="new_author" /> 
  <xsl:param name="new_price" /> 

  <!-- IDENTITY TRANSFORM -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- REWRITE NODE TEXT -->
  <xsl:template match="author">
    <xsl:copy>
      <xsl:value-of select="$new_author"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="price">
    <xsl:copy>
      <xsl:value-of select="$new_price"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Python (no for loops or if/else logic)

import lxml.etree as et

# LOAD XSL SCRIPT
xml = et.parse('Input.xml')
xsl = et.parse('Script.xsl')

# INITIALIZE TRANSFORMER
transform = et.XSLT(xsl)

# PASS PARAMETER TO XSLT
result = transform(xml, 
                   new_author = et.XSLT.strparam('Smith, Paul'),
                   new_price = et.XSLT.strparam('79.99'))

print(result)
# <?xml version="1.0"?>
# <catalog>
#   <book id="bk103">
#     <author>Smith, Paul</author>
#     <title>Maeve Ascendant</title>
#     <genre>Fantasy</genre>
#     <price>79.99</price>
#     <publish_date>2000-11-17</publish_date>
#     <description>After the collapse of a nanotechnology 
#               society in England, the young survivors lay the 
#               foundation for a new society.</description>
#   </book>
# </catalog>

# SAVE XML TO FILE
with open('Output.xml', 'wb') as f:
    f.write(result)

Online Demo

edited Jan 6, 2020 at 17:55

answered Jan 4, 2020 at 20:42

Parfait

108k19 gold badges103 silver badges138 bronze badges

2 Comments

Rio05UK Over a year ago

Hi @Parfait, thank you this looks interesting, i used "my_dir" from initial post within xml = et.parse(my_dir). unfortunately this did'nt work. Any help on what i should be doing?

Parfait Over a year ago

et.parse is used on specific file not a directory. You will need to integrate a loop of this solution passing each XML of directory into this solution.

Collectives™ on Stack Overflow

How to edit multiple xml tag data for XML files in a specific directory using python

2 Answers 2

4 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related