0

This question is a follow on from my previous post: How to convert xml file to csv output in python?

Again, I have a basic XML file that is being pulled from a database outside of my control.

<?xml version="1.0" encoding="utf-8"?>
<data>
<Job1Start><Time>20200202055415725</Time></Job1Start>
<Job1End><Time>20200202055423951</Time></Job1End>
<Job2Start><Time>20200202055810390</Time></Job2Start>
<Job3Start><Time>20200202055814687</Time></Job3Start>
<Job2End><Time>20200202055819000</Time></Job2End>
<Job3End><Time>20200202055816708</Time></Job3End>
<Job1Start><Time>20200203053415725</Time></Job1Start>
<Job1End><Time>20200203056423951</Time></Job1End>
</data>

My current code is shown below:

import xml.etree.ElementTree as ET
import csv

tree = ET.parse('StackedExample.xml')
root = tree.getroot()

with open('Output.csv', 'w', newline='') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow('Task Start Finish'.split())
        tasklist = ['Job1', 'Job2','Job3']
        for Task in tasklist:
            start = root.find(f'.//{Task}Start/Time').text
            end = root.find(f'.//{Task}End/Time').text
            writer.writerow([f'{Task}',start,end])
            print(f'{Task}',start,end)

This outputs the below, but only gives one output for "Job1":

Task    Start               Finish
Job1    20200202055415725   20200202055423951
Job2    20200202055810390   20200202055819000
Job3    20200202055814687   20200202055816708

I'm looking to get something like this:

Task    Start               Finish
Job1    20200202055415725   20200202055423951
Job1    20200203053415725   20200203056423951
Job2    20200202055810390   20200202055819000
Job3    20200202055814687   20200202055816708

Any ideas?

1 Answer 1

1

find will only give you the first appearance of the tag you can use findall, for start time, and findall for end time, then make a zip() from both of them

import itertools
for Task in tasklist:
    start = root.findall(f'.//{Task}Start/Time')
    start_txt = []
    for s in start:
        start_txt.append(s.text) 
    end = root.findall(f'.//{Task}End/Time')
    end_txt = []
    for e in end_txt:
        end_txt.append(e.text)
    row_list = list(zip(start_txt,end_txt))
    for row in row_list
        writer.writerow([f'{Task}',row[0],row[1]])

not very elegent but works

import xml.etree.ElementTree as ET
import glob
import os
import pandas as pd

path = r"D:\t.xml"
file = open(path)
tree = ET.parse(file)
root = tree.getroot()
m1 = {"Task": "Job1"}
m2 = {"Task": "Job2"}
m3 = {"Task": "Job3"}
out = []
for t in root:
    time = t.find(".//Time")
    txt = time.text
    if "1Start" in t.tag:
        m1["Start"] = txt
    if "1End" in t.tag:
        m1["End"] = txt
        out.append(m1)
    if "2Start" in t.tag:
        m2["Start"] = txt
    if "2End" in t.tag:
        m2["End"] = txt
        out.append(m2)
    if "3Start" in t.tag:
        m3["Start"] = txt
    if "3End" in t.tag:
        m3["End"] = txt
        out.append(m3)
df = pd.DataFrame(out)
df.to_excel("D:\out.xlsx")
Sign up to request clarification or add additional context in comments.

4 Comments

With this I'm getting: AttributeError: 'list' object has no attribute 'text'
When I remove the ".text" I get: ['Job1', <Element 'Time' at 0x033D0DC0>, <Element 'Time' at 0x033D0E10>] ['Job2', <Element 'Time' at 0x033D0E88>, <Element 'Time' at 0x033D0F50>] ['Job3', <Element 'Time' at 0x033D0F00>, <Element 'Time' at 0x033D0FA0>]
my bad, try it now
Now its: Job3 [<Element 'Time' at 0x05BD90C0>] [<Element 'Time' at 0x05BD91E0>]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.