Python Conditional XML Writing

Question

I am using Python to convert CSV files to XML format. The CSV files have a varying amount of rows ranging anywhere from 2 (including headers) to infinity. (realistically 10-15 but unless there's some major performance issue, I'd like to cover my bases) In order to convert the files I have the following code:

for row in csvData:
    if rowNum == 0:
        xmlData.write('    <'+csvFile[:-4]+'-1>' + "\n")
        tags = row
        # replace spaces w/ underscores in tag names
        for i in range(len(tags)):
            tags[i] = tags[i].replace(' ', '_')
    if rowNum == 1: 
        for i in range(len(tags)):
            xmlData.write('        ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('    </'+csvFile[:-4]+'-1>' + "\n" + '    <' +csvFile[:-4]+'-2>' + "\n")
    if rowNum == 2:
        for i in range(len(tags)):
            xmlData.write('        ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('    </'+csvFile[:-4]+'-2>' + "\n")
    if rowNum == 3:
        for i in range(len(tags)):
            xmlData.write('<'+csvFile[:-4]+'-3>' + "\n" + '        ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('    </'+csvFile[:-4]+'-3>' + "\n")

    rowNum +=1
xmlData.write('</csv_data>' + "\n")
xmlData.close()

As you can see, I have the upper-level tags set to be created manually if the row exists. Is there a more efficient way to achieve my goal of creating the <csvFile-*></csvFile-*> tags rather than repeating my code 15+ times? Thanks!

unutbu · Accepted Answer · 2012-10-25 16:54:28Z

I would use xml.etree.ElementTree or lxml.etree to write the XML. xml.etree.ElementTree is in the standard library, but does not have built-in pretty-printing. (You could use the indent function from here, however).

lxml.etree is a third-party module, but it has built-in pretty-printing in its tostring method.

Using lxml.etree, you could do something like this:

import lxml.etree as ET

csvData = [['foo bar', 'baz quux'],['bing bang', 'bim bop', 'bip burp'],]
csvFile = 'rowboat'
name = csvFile[:-4]
root = ET.Element('csv_data')
for num, tags in enumerate(csvData):
    row = ET.SubElement(root, '{f}-{n}'.format(f = name, n = num))
    for text in tags:
        text = text.replace(' ', '_')
        tag = ET.SubElement(row, text)
        tag.text = text

print(ET.tostring(root, pretty_print = True))

yields

<csv_data>
  <row-0>
    <foo_bar>foo_bar</foo_bar>
    <baz_quux>baz_quux</baz_quux>
  </row-0>
  <row-1>
    <bing_bang>bing_bang</bing_bang>
    <bim_bop>bim_bop</bim_bop>
    <bip_burp>bip_burp</bip_burp>
  </row-1>
</csv_data>

Some suggestions:

In Python, almost never do you need to say
```
for i in range(len(tags)):
    # do stuff with tags[i]
```
Instead say
```
for tag in tags:
```
to loop over all the items in tags.
Also instead of manually counting the times through a loop with
```
num = 0
for tags in csvData:
    num += 1
```
instead use the enumerate function:
```
for num, tags in enumerate(csvData):
```
Strings like
```
'        ' + '<' + tags[i] + '>' \
                         + row[i] + '</' + tags[i] + '>' + "\n"
```
are incredibly difficult to read. It mixes together logic of indentation, with the XML syntax of tags, with the minutia of end of line characters. That's where xml.etree.ElementTree or lxml.etree will help you. It will take care of the serialization of the XML for you; all you need to provide is the relationship between the XML elements. The code will be much more readable and easier to maintain.

Great response, thank you! My code is getting cleaner by the minute!

Collectives™ on Stack Overflow

Python Conditional XML Writing

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related