2

I have a basic XML file that is being pulled from a database outside of my control.

<?xml version="1.0" encoding="utf-8"?>
<data>
<Job1Start><Time>20200202055415725</Time></Job1Start>
<Job1End><Time>20200202055423951</Time></Job1End>
<Job2Start><Time>20200202055810390</Time></Job2Start>
<Job3Start><Time>20200202055814687</Time></Job3Start>
<Job2End><Time>20200202055819000</Time></Job2End>
<Job3End><Time>20200202055816708</Time></Job3End>
</data>

I'm looking to get the following output in a CSV file:

Task    Start               Finish
Job1    20200202055415725   20200202055423951
Job2    20200202055810390   20200202055819000
Job3    20200202055814687   20200202055816708

I have tried a few methods, the below seems to be the closest I have gotten to a correct output but even this isn't working correctly:

import xml.etree.ElementTree as ET
import csv

tree = ET.parse('Jobs.xml')
root = tree.getroot()

with open('Output.csv', 'w') as csv_file:
        writer = csv.writer(csv_file, delimiter=',')
        for TaskName in root.findall('Job1Start'):
            starttime = TaskName.find('Time').text
            task = "Job1"
            writer.writerows(zip(task, starttime))
            print("Job1", starttime)

The output I get from this is shown below. Its formatting is incorrect and I've only been able to search for the start time on Job1:

Output CSV

Anyone have experience with a similar problem?

4
  • I've edited the post with an image of the current output. The formatting is wrong and this code only allows me to search for the start time. I haven't been able to join the start and end times. Commented Feb 3, 2020 at 17:14
  • 1
    It looks like that zip caused your problems... Always first try printing out intermediate results. Commented Feb 3, 2020 at 17:21
  • 1
    @usr2564301 No, writerows instead of writerow causes the problem. The latter expects a list of lists (or more accurately an iterable of iterables) and strings are iterable, so a list of strings meets the requirement, but the inner "list" item is a single character. Commented Feb 3, 2020 at 17:23
  • Yeah, I was printing without the zip and it looked fine thats where part of the confusion came in. Thats great though, thanks for the help! Commented Feb 3, 2020 at 17:25

1 Answer 1

2

writerows instead of writerow causes the single character problem and csv.writer. writerows expects a list of lists (or more accurately an iterable of iterables) and strings are iterable, so a list of strings meets the requirement, but the inner "list" item is a single character.

csv.writer also requires newline='' per the documentation, and on Windows lack of this parameter shows up as extra blank lines between rows when a CSV is opened in Excel.

Here's a solution:

import xml.etree.ElementTree as ET
import csv

tree = ET.parse('Jobs.xml')
root = tree.getroot()

# Use newline='' per csv docs.  This fixes the blanks lines in your output
with open('Output.csv', 'w', newline='') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow('Task Start Finish'.split())
        for job in range(1,4):
            start = root.find(f'Job{job}Start/Time').text
            end = root.find(f'Job{job}End/Time').text
            # Use writerow not writerows...latter expects list of lists.
            writer.writerow([f'Job{job}',start,end])

Output:

Task,Start,Finish
Job1,20200202055415725,20200202055423951
Job2,20200202055810390,20200202055819000
Job3,20200202055814687,20200202055816708
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.