2

I have a CSV file that contains a header row followed by a potentially unlimited number of rows with values. For example:

FieldA,FieldB,FieldC,FieldD
1,asdf,2,ghjk
3,qwer,4,yuio
5,slslkd,,aldkjslkj

What I need to do is for each row, create a quasi-XML string where the elements are labeled as the column name and information within each element is the value of the cell. Using the above as an example, if I iterate through each of the three rows I would end up with these three strings:

<FieldA>1</FieldA><FieldB>asdf</FieldB><FieldC>2</FieldC><FieldD>ghjk</FieldD>

<FieldA>3</FieldA><FieldB>qwer</FieldB><FieldC>4</FieldC><FieldD>yuio</FieldD>

<FieldA>5</FieldA><FieldB>slslkd</FieldB><FieldD>aldkjslkj</FieldD>

The way I am currently doing is is:

for row in r:
    if row['FieldA']:
        fielda = '<FieldA>{0}</FieldA>'.format(row['FieldA'])
    else:
        fielda = ''

    if row['FieldB']:
        fieldb = '<FieldB>{0}</FieldB>'.format(row['FieldB'])
    else:
        fieldb = ''

    if row['FieldC']:
        fieldc = '<FieldC>{0}</FieldC>'.format(row['FieldC'])
    else:
        fieldc = ''

    if row['FieldD']:
        fieldd = '<FieldD>{0}</FieldD>'.format(row['FieldD'])
    else:
        fieldd = ''

    # Compile the string
    final_string = fielda + fieldb + fieldc + fieldd

    # Process further
    do_something(final_string)

As it iterates through each row, this creates the appropriate string and then I can pass it on for further processing.

Is there a better way to achieve what I want, or is my approach the best way? My guess is there is a better, more Pythonic, and more efficient way, but I'm new-ish to Python.

Thanks.

3 Answers 3

2

Slightly modified code that fixed the issue I was having. Turned out to be pretty trivial:

with open(csv_file) as f:
    for row in csv.DictReader(f):
        top = Element('event')
        for k, v in row.items():
            child = SubElement(top, k)
            child.text = v
        print tostring(top)

Thanks for the help!

Sign up to request clarification or add additional context in comments.

Comments

1

Python is Batteries Included.

In this case, you can use the csv module and the xml module, with code that looks like this:

# CSV module
import csv
# Stuff from the XML module
from xml.etree.ElementTree import Element, SubElement, tostring

# Topmost XML element
top = Element('top')
# Open a file
with open('stuff.csv') as csvfile:
    # And use a dictionary-reader
    for d in csv.DictReader(csvfile)
        # For each mapping in the dictionary
        for (k, v) in d.iteritems():
            # Create an XML node
            child = SubElement(top, k)
            child.text = v
print tostring(top)

1 Comment

Thanks. I feel simple minded because I'm not sure I would have thought of that, and obviously I didn't. Is a top element required? When I send this on to processing I need it to start with <FieldA>, not <top>. Also, this concatenates the string results of all the rows, but I need a separate string for each row. I thought it was the indentation of the print statement, but doesnt seem to be the case. So ideally, row 1 would be its own XML string, row 2 would be its own, etc in the same format.
-1

'Top' is just the highest level node -- you could use whatever text you want to wrap the whole document.

You can pretty-print it pretty simply as well: http://pymotw.com/2/xml/etree/ElementTree/create.html#pretty-printing-xml

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.