2

I'm incredibly new at this, and I've tried searching but nothing I've found has been able to work for me.

I have xml data that looks like this

<datainfo>
   <data>
       <info State="1" Reason="x" Start="01/01/2016 00:00:00.000" End="01/01/2016 02:00:00.000"></info>
       <info State="1" Reason="y" Start="01/01/2016 02:00:00.000" End="01/01/2016 02:01:00.000">
            <moreinfo Start="01/01/2016 02:00:00.000" End="01/01/2016 02:00:30.000"/>
            <moreinfo Start="01/01/2016 02:00:30.000" End="01/01/2016 02:01:00.000"/>
       </info>
       <info State="2" Start="01/01/2016 02:01:00.000" End="01/01/2016 02:10:00.000"></info>
       ...
   </data>
</datainfo>

I want find how much time was spent in State {1,2,...} for reason {x,y,...} on a specific day and have that print to a .csv format to be latter read in excel.

The issue I'm having is I can't use static variables because there are hundreds of different states for hundreds of different reasons, and they change constantly.

If I'm not clear please tell me, I am brand new to this and really appreciate any and all help.

Edit: Here is what I currently have, hopefully this will clear up what I'm trying to do.

from datetime import datetime
from lxml import etree as ET

def parseXML(file):
    handler = open(file, "r") 
    tree = ET.parse(handler)  
    info_list = tree.xpath('//info')
    root = tree.getroot()
    dictionary = {}
    info_len = len(info_list)

    for i in range(info_len):
         info=root[0][0][i]
         info_attribs = info.attrib
         end = info_attribs[u'End']
         start = info_attribs[u'Start']
         FMT = '%m/%d/%Y %H:%M:%S.%f'
         tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
         t_dif = (tdelta.total_seconds()) / 60
         try:
             dictionary[info_attribs[u'State'] + status_attribs[u'Reason']] = t_dif
         except:
             continue

I'm trying to iterate through each line, find the State and the reason, then add them to a dictionary. If the entry already exists for that state and reason, I want to add it to the current value.

Let me know if I should provide more info!

Edit #2:

The output I'm looking for would be in the form of a .csv, stuctured like this:

State - Reason, [Total time spent in State 1 for x reason]
4
  • 1
    You could be a bit more clear. Variables in Python are not static and can be re-used with whatever type you want. Are you using classes to parse the XML etc? A bit of example code would go a long way to understanding your issue Commented Oct 11, 2016 at 21:35
  • @Don I edited my post, please let me know if that helps! Commented Oct 11, 2016 at 22:00
  • The update helps. Could you edit your question again and also show the desired output (or the names and values of the created variables)? Commented Oct 11, 2016 at 22:04
  • 1
    @martineau I've edited it again. I would be looking to get this data into excel to then make graphs and other visualizations out of it, if that information helps at all. Thank you for all the help! Commented Oct 11, 2016 at 23:16

2 Answers 2

3

You can use a defaultdict for recurring keys using lists as value, you can also filter the info nodes using an xpath to only find the nodes that have both of the attributes you want so no need for any except:

x = """<datainfo>
   <data>
       <info State="1" Reason="x" Start="01/01/2016 00:00:00.000" End="01/01/2016 02:00:00.000"></info>
       <info State="1" Reason="y" Start="01/01/2016 02:00:00.000" End="01/01/2016 02:01:00.000">
            <moreinfo Start="01/01/2016 02:00:00.000" End="01/01/2016 02:00:30.000"/>
            <moreinfo Start="01/01/2016 02:00:30.000" End="01/01/2016 02:01:00.000"/>
       </info>
       <info State="2" Start="01/01/2016 02:01:00.000" End="01/01/2016 02:10:00.000"></info>
   </data>
</datainfo>"""

from collections import defaultdict
import lxml.etree as et
from datetime import datetime

FMT = '%m/%d/%Y %H:%M:%S.%f'
tree = et.fromstring(x)
d = defaultdict(list)

for node in tree.xpath("//data/info[@Reason and @State]"):
    state = node.attrib["State"]
    reason = node.attrib["Reason"]
    end = node.attrib["End"]
    start = node.attrib[u'Start']
    tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
    d[state, reason].append((tdelta.total_seconds()) / 60))

print(d)

Depending on how you want the data to look for recurring keys would determine how you wrote to the csv, if you wanted one row each:

import csv
with open("out.csv", "w") as f:
    wr = csv.writer(f)
    for k,v in d.items():
        for val in v:
            wr.writerow([k] + val)

If you actually want to sum:

d = defaultdict(float)

for node in tree.xpath("//data/info[@Reason and @State]"):
    state = node.attrib["State"]
    reason = node.attrib["Reason"]
    end = node.attrib["End"]
    start = node.attrib[u'Start']
    tdelta = datetime.strptime(end, FMT) - datetime.strptime(start, FMT)
    d[state, reason] += (tdelta.total_seconds()) / 60

Then:

import csv
with open("out.csv", "w") as f:
    wr = csv.writer(f)
    wr.writerows(d.items())
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you! This solution works, I'm just having to sum the values for them before outputting to the .csv.
@TylerMarques, ah so you want to sum?
@Padraic_Cunningham Yes I'm looking to get the sum of all the values, but that I know how to do and was able to do by iterating over the list and summing the result. Thank you!
0

This is assuming you have your xml parsed into an array of arrays

import csv

# This is assuming you have your xml parsed into an array of arrays  [['state', 'reason'], ['state', 'reason']]
# example of array format
data = [['1', 'x'], ['1', 'y'], ['2', 'z']]

with open("output.csv", "w") as f:
    writer = csv.writer(f)
    writer.writerows(data)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.