1

I am using Python to convert CSV files to XML format. The CSV files have a varying amount of rows ranging anywhere from 2 (including headers) to infinity. (realistically 10-15 but unless there's some major performance issue, I'd like to cover my bases) In order to convert the files I have the following code:

for row in csvData:
    if rowNum == 0:
        xmlData.write('    <'+csvFile[:-4]+'-1>' + "\n")
        tags = row
        # replace spaces w/ underscores in tag names
        for i in range(len(tags)):
            tags[i] = tags[i].replace(' ', '_')
    if rowNum == 1: 
        for i in range(len(tags)):
            xmlData.write('        ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('    </'+csvFile[:-4]+'-1>' + "\n" + '    <' +csvFile[:-4]+'-2>' + "\n")
    if rowNum == 2:
        for i in range(len(tags)):
            xmlData.write('        ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('    </'+csvFile[:-4]+'-2>' + "\n")
    if rowNum == 3:
        for i in range(len(tags)):
            xmlData.write('<'+csvFile[:-4]+'-3>' + "\n" + '        ' + '<' + tags[i] + '>' \
                          + row[i] + '</' + tags[i] + '>' + "\n")
        xmlData.write('    </'+csvFile[:-4]+'-3>' + "\n")

    rowNum +=1
xmlData.write('</csv_data>' + "\n")
xmlData.close()

As you can see, I have the upper-level tags set to be created manually if the row exists. Is there a more efficient way to achieve my goal of creating the <csvFile-*></csvFile-*> tags rather than repeating my code 15+ times? Thanks!

1 Answer 1

4

I would use xml.etree.ElementTree or lxml.etree to write the XML. xml.etree.ElementTree is in the standard library, but does not have built-in pretty-printing. (You could use the indent function from here, however).

lxml.etree is a third-party module, but it has built-in pretty-printing in its tostring method.

Using lxml.etree, you could do something like this:

import lxml.etree as ET

csvData = [['foo bar', 'baz quux'],['bing bang', 'bim bop', 'bip burp'],]
csvFile = 'rowboat'
name = csvFile[:-4]
root = ET.Element('csv_data')
for num, tags in enumerate(csvData):
    row = ET.SubElement(root, '{f}-{n}'.format(f = name, n = num))
    for text in tags:
        text = text.replace(' ', '_')
        tag = ET.SubElement(row, text)
        tag.text = text

print(ET.tostring(root, pretty_print = True))

yields

<csv_data>
  <row-0>
    <foo_bar>foo_bar</foo_bar>
    <baz_quux>baz_quux</baz_quux>
  </row-0>
  <row-1>
    <bing_bang>bing_bang</bing_bang>
    <bim_bop>bim_bop</bim_bop>
    <bip_burp>bip_burp</bip_burp>
  </row-1>
</csv_data>

Some suggestions:

  • In Python, almost never do you need to say

    for i in range(len(tags)):
        # do stuff with tags[i]
    

    Instead say

    for tag in tags:
    

    to loop over all the items in tags.

  • Also instead of manually counting the times through a loop with

    num = 0
    for tags in csvData:
        num += 1
    

    instead use the enumerate function:

    for num, tags in enumerate(csvData):
    
  • Strings like

    '        ' + '<' + tags[i] + '>' \
                             + row[i] + '</' + tags[i] + '>' + "\n"
    

    are incredibly difficult to read. It mixes together logic of indentation, with the XML syntax of tags, with the minutia of end of line characters. That's where xml.etree.ElementTree or lxml.etree will help you. It will take care of the serialization of the XML for you; all you need to provide is the relationship between the XML elements. The code will be much more readable and easier to maintain.

Sign up to request clarification or add additional context in comments.

1 Comment

Great response, thank you! My code is getting cleaner by the minute!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.