0

I have this xml format.....

<event timestamp="0.447463" bustype="LIN" channel="LIN 1">  
 <col name="Time"/>  
 <col name="Start of Frame">0.440708</col>  
 <col name="Channel">LIN 1</col>  
 <col name="Dir">Tx</col>  
 <col name="Event Type">LIN Frame (Diagnostic Request)</col>  
 <col name="Frame Name">MasterReq_DB</col>  
 <col name="Id">3C</col>  
 <col name="Data">81 06 04 04 FF FF 50 4C</col>  
 <col name="Publisher">TestMaster (simulated)</col>  
 <col name="Checksum">D3 &quot;Classic&quot;</col>  
 <col name="Header Duration">2.090 ms (40.1 bits)</col>  
 <col name="Resp. Duration">4.688 ms (90.0 bits)</col>  
 <col name="Time difference">0.049987</col>  
 <empty/>  
</event>  

In above xml, i need to extract data associated with attribute 'name'
Am able to get all names but am unable to fetch >MasterReq_DB< field
Please help me ...
Thanks in advance

My python code is...

import sys 
import array
import string
from xml.dom.minidom import parse,parseString
from xml.dom import minidom                                              
input_file = open("test_input.txt",'r')                                                
alines = input_file.read()
word_lst = alines.split("'")
filename = word_lst[1]
pathname=word_lst[3]                                               
f = open(pathname,'r')
doc = minidom.parse(f)
node = doc.documentElement
events = doc.getElementsByTagName('event')
for event in events:
    #print (event)
    columns =  event.getElementsByTagName('col')
    for column in columns:
        #print (column)
        head = column.getAttribute('name')
        if (head == ('Frame Name')):
           print (head)
           request = head.firstChild.wholeText
           print (request)
print ("DOne")
6
  • What code have you tried? Have you looked at elementtree and lxml (the latter being a more powerful extension overlapping in functionality with the former). Commented Jun 16, 2012 at 9:14
  • please see my python code above... Commented Jun 16, 2012 at 9:31
  • And print (request) outputs what exactly? Have you tried print (repr(request))? I'd strongly advise switching to elementtree as a vastly superior XML API for python. Commented Jun 16, 2012 at 9:36
  • i get error as: Frame Name Traceback (most recent call last): File "C:\Users\rshirurm\Desktop\AD7180_aut\AD7180_auto.py", line 25, in <module> request = head.firstChild.wholeText AttributeError: 'str' object has no attribute 'firstChild' Commented Jun 16, 2012 at 9:51
  • 1
    There is your hint: head is a string (the value of the column attribute).. use column.firstChild perhaps? :-P Commented Jun 16, 2012 at 9:54

1 Answer 1

1

Here's a primer to get you started with lxml if you wish to:

In [1]: x = '''<event timestamp="0.447463" bustype="LIN" channel="LIN 1">  
   ...:  <col name="Time"/>  
   ...:  <col name="Start of Frame">0.440708</col>  
   ...:  <col name="Channel">LIN 1</col>  
   ...:  <col name="Dir">Tx</col>  
   ...:  <col name="Event Type">LIN Frame (Diagnostic Request)</col>  
   ...:  <col name="Frame Name">MasterReq_DB</col>  
   ...:  <col name="Id">3C</col>  
   ...:  <col name="Data">81 06 04 04 FF FF 50 4C</col>  
   ...:  <col name="Publisher">TestMaster (simulated)</col>  
   ...:  <col name="Checksum">D3 &quot;Classic&quot;</col>  
   ...:  <col name="Header Duration">2.090 ms (40.1 bits)</col>  
   ...:  <col name="Resp. Duration">4.688 ms (90.0 bits)</col>  
   ...:  <col name="Time difference">0.049987</col>  
   ...:  <empty/>  
   ...: </event> '''

In [2]: from lxml import etree

In [3]: tree = etree.fromstring(x)

In [4]: [elem.text for elem in tree.xpath('//*[@name]')]
Out[4]: 
[None,
 '0.440708',
 'LIN 1',
 'Tx',
 'LIN Frame (Diagnostic Request)',
 'MasterReq_DB',
 '3C',
 '81 06 04 04 FF FF 50 4C',
 'TestMaster (simulated)',
 'D3 "Classic"',
 '2.090 ms (40.1 bits)',
 '4.688 ms (90.0 bits)',
 '0.049987']

In [5]: [name for name in tree.xpath('//@name')]
Out[5]: 
['Time',
 'Start of Frame',
 'Channel',
 'Dir',
 'Event Type',
 'Frame Name',
 'Id',
 'Data',
 'Publisher',
 'Checksum',
 'Header Duration',
 'Resp. Duration',
 'Time difference']

To read from file instead of a string, use lxml.etree.parse function.

Here's a link to lxml tutorial. This one is a reference for XPath syntax.

Sign up to request clarification or add additional context in comments.

4 Comments

hey,what do you suggest me ? use lxml or DOM bcos this is just start of my work and i need parse xml files which are in Mbytes...
I haven't got any experience with DOM at all, to be honest. lxml is quite good for parsing. For parsing files of several Gb in size I use the iterparse method of lxml, works great. For smaller files something like the example in my answer is what I normally do.
thanks for suggetion...How can i write output to excel2007...?
@Rohit Take a look at this question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.