Creating dynamic variables for XML Parsing

Question

I'm incredibly new at this, and I've tried searching but nothing I've found has been able to work for me.

I have xml data that looks like this

<datainfo>
   <data>
       <info State="1" Reason="x" Start="01/01/2016 00:00:00.000" End="01/01/2016 02:00:00.000"></info>
       <info State="1" Reason="y" Start="01/01/2016 02:00:00.000" End="01/01/2016 02:01:00.000">
            <moreinfo Start="01/01/2016 02:00:00.000" End="01/01/2016 02:00:30.000"/>
            <moreinfo Start="01/01/2016 02:00:30.000" End="01/01/2016 02:01:00.000"/>
       </info>
       <info State="2" Start="01/01/2016 02:01:00.000" End="01/01/2016 02:10:00.000"></info>
       ...
   </data>
</datainfo>

I want find how much time was spent in State {1,2,...} for reason {x,y,...} on a specific day and have that print to a .csv format to be latter read in excel.

The issue I'm having is I can't use static variables because there are hundreds of different states for hundreds of different reasons, and they change constantly.

If I'm not clear please tell me, I am brand new to this and really appreciate any and all help.

Edit: Here is what I currently have, hopefully this will clear up what I'm trying to do.

from datetime import datetime
from lxml import etree as ET

def parseXML(file):
    handler = open(file, "r") 
    tree = ET.parse(handler)  
    info_list = tree.xpath('//info')
    root = tree.getroot()
    dictionary = {}
    info_len = len(info_list)

    for i in range(info_len):
         info=root[0][0][i]
         info_attribs = info.attrib
         end = info_attribs[u'End']
         start = info_attribs[u'Start']
         FMT = '%m/%d/%Y %H:%M:%S.%f'
         tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
         t_dif = (tdelta.total_seconds()) / 60
         try:
             dictionary[info_attribs[u'State'] + status_attribs[u'Reason']] = t_dif
         except:
             continue

I'm trying to iterate through each line, find the State and the reason, then add them to a dictionary. If the entry already exists for that state and reason, I want to add it to the current value.

Let me know if I should provide more info!

Edit #2:

The output I'm looking for would be in the form of a .csv, stuctured like this:

State - Reason, [Total time spent in State 1 for x reason]

You could be a bit more clear. Variables in Python are not static and can be re-used with whatever type you want. Are you using classes to parse the XML etc? A bit of example code would go a long way to understanding your issue — Don
– Don, Commented Oct 11, 2016 at 21:35
The update helps. Could you edit your question again and also show the desired output (or the names and values of the created variables)? — martineau
– martineau, Commented Oct 11, 2016 at 22:04
@martineau I've edited it again. I would be looking to get this data into excel to then make graphs and other visualizations out of it, if that information helps at all. Thank you for all the help! — TMarks
– TMarks, Commented Oct 11, 2016 at 23:16

martineau · Accepted Answer · 2016-10-12 01:59:48Z

3

You can use a defaultdict for recurring keys using lists as value, you can also filter the info nodes using an xpath to only find the nodes that have both of the attributes you want so no need for any except:

x = """<datainfo>
   <data>
       <info State="1" Reason="x" Start="01/01/2016 00:00:00.000" End="01/01/2016 02:00:00.000"></info>
       <info State="1" Reason="y" Start="01/01/2016 02:00:00.000" End="01/01/2016 02:01:00.000">
            <moreinfo Start="01/01/2016 02:00:00.000" End="01/01/2016 02:00:30.000"/>
            <moreinfo Start="01/01/2016 02:00:30.000" End="01/01/2016 02:01:00.000"/>
       </info>
       <info State="2" Start="01/01/2016 02:01:00.000" End="01/01/2016 02:10:00.000"></info>
   </data>
</datainfo>"""

from collections import defaultdict
import lxml.etree as et
from datetime import datetime

FMT = '%m/%d/%Y %H:%M:%S.%f'
tree = et.fromstring(x)
d = defaultdict(list)

for node in tree.xpath("//data/info[@Reason and @State]"):
    state = node.attrib["State"]
    reason = node.attrib["Reason"]
    end = node.attrib["End"]
    start = node.attrib[u'Start']
    tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
    d[state, reason].append((tdelta.total_seconds()) / 60))

print(d)

Depending on how you want the data to look for recurring keys would determine how you wrote to the csv, if you wanted one row each:

import csv
with open("out.csv", "w") as f:
    wr = csv.writer(f)
    for k,v in d.items():
        for val in v:
            wr.writerow([k] + val)

If you actually want to sum:

d = defaultdict(float)

for node in tree.xpath("//data/info[@Reason and @State]"):
    state = node.attrib["State"]
    reason = node.attrib["Reason"]
    end = node.attrib["End"]
    start = node.attrib[u'Start']
    tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
    d[state, reason] += (tdelta.total_seconds()) / 60

Then:

import csv
with open("out.csv", "w") as f:
    wr = csv.writer(f)
    wr.writerows(d.items())

edited Oct 12, 2016 at 1:59

martineau

124k29 gold badges181 silver badges319 bronze badges

answered Oct 11, 2016 at 22:55

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

TMarks Over a year ago

Thank you! This solution works, I'm just having to sum the values for them before outputting to the .csv.

Padraic Cunningham Over a year ago

@TylerMarques, ah so you want to sum?

TMarks Over a year ago

@Padraic_Cunningham Yes I'm looking to get the sum of all the values, but that I know how to do and was able to do by iterating over the list and summing the result. Thank you!

azillion · Accepted Answer · 2016-10-11 21:59:54Z

0

This is assuming you have your xml parsed into an array of arrays

import csv

# This is assuming you have your xml parsed into an array of arrays  [['state', 'reason'], ['state', 'reason']]
# example of array format
data = [['1', 'x'], ['1', 'y'], ['2', 'z']]

with open("output.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerows(data)

answered Oct 11, 2016 at 21:59

azillion

468 bronze badges

Collectives™ on Stack Overflow

Creating dynamic variables for XML Parsing

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related