How to convert xml file to csv output in python?

Question

I have a basic XML file that is being pulled from a database outside of my control.

<?xml version="1.0" encoding="utf-8"?>
<data>
<Job1Start><Time>20200202055415725</Time></Job1Start>
<Job1End><Time>20200202055423951</Time></Job1End>
<Job2Start><Time>20200202055810390</Time></Job2Start>
<Job3Start><Time>20200202055814687</Time></Job3Start>
<Job2End><Time>20200202055819000</Time></Job2End>
<Job3End><Time>20200202055816708</Time></Job3End>
</data>

I'm looking to get the following output in a CSV file:

Task    Start               Finish
Job1    20200202055415725   20200202055423951
Job2    20200202055810390   20200202055819000
Job3    20200202055814687   20200202055816708

I have tried a few methods, the below seems to be the closest I have gotten to a correct output but even this isn't working correctly:

import xml.etree.ElementTree as ET
import csv

tree = ET.parse('Jobs.xml')
root = tree.getroot()

with open('Output.csv', 'w') as csv_file:
        writer = csv.writer(csv_file, delimiter=',')
        for TaskName in root.findall('Job1Start'):
            starttime = TaskName.find('Time').text
            task = "Job1"
            writer.writerows(zip(task, starttime))
            print("Job1", starttime)

The output I get from this is shown below. Its formatting is incorrect and I've only been able to search for the start time on Job1:

Output CSV

Anyone have experience with a similar problem?

I've edited the post with an image of the current output. The formatting is wrong and this code only allows me to search for the start time. I haven't been able to join the start and end times. — Buckzer
– Buckzer, Commented Feb 3, 2020 at 17:14
It looks like that zip caused your problems... Always first try printing out intermediate results. — Jongware
– Jongware, Commented Feb 3, 2020 at 17:21
@usr2564301 No, writerows instead of writerow causes the problem. The latter expects a list of lists (or more accurately an iterable of iterables) and strings are iterable, so a list of strings meets the requirement, but the inner "list" item is a single character. — Mark Tolonen
– Mark Tolonen, Commented Feb 3, 2020 at 17:23
Yeah, I was printing without the zip and it looked fine thats where part of the confusion came in. Thats great though, thanks for the help! — Buckzer
– Buckzer, Commented Feb 3, 2020 at 17:25

Mark Tolonen · Accepted Answer · 2020-02-03 17:26:34Z

writerows instead of writerow causes the single character problem and csv.writer. writerows expects a list of lists (or more accurately an iterable of iterables) and strings are iterable, so a list of strings meets the requirement, but the inner "list" item is a single character.

csv.writer also requires newline='' per the documentation, and on Windows lack of this parameter shows up as extra blank lines between rows when a CSV is opened in Excel.

Here's a solution:

import xml.etree.ElementTree as ET
import csv

tree = ET.parse('Jobs.xml')
root = tree.getroot()

# Use newline='' per csv docs.  This fixes the blanks lines in your output
with open('Output.csv', 'w', newline='') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow('Task Start Finish'.split())
        for job in range(1,4):
            start = root.find(f'Job{job}Start/Time').text
            end = root.find(f'Job{job}End/Time').text
            # Use writerow not writerows...latter expects list of lists.
            writer.writerow([f'Job{job}',start,end])

Output:

Task,Start,Finish
Job1,20200202055415725,20200202055423951
Job2,20200202055810390,20200202055819000
Job3,20200202055814687,20200202055816708

Collectives™ on Stack Overflow

How to convert xml file to csv output in python?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related