xml scraping with python with odd xml structure

Question

<attribute>
  <name>Index</name>
  <values>
   <zip>
     <value>323800</value>
   </zip>
   <nation>
     <value>195300</value>
   </nation>
  </values>
</attribute>
<attribute>
 <name>Value_1</name>
 <values>
  <nation>
   <value>193800</value>
  </nation>
 </values>
</attribute>
<attribute>
 <name>Value_2</name>
 <values>
  <zip>
   <value>1000</value>
  </zip>
  <nation>
   <value>2000</value>
  </nation>
 </values>
</attribute>

Above is an extract from a larger xml tree I am working with. I want to create a dictionary where the text for the name tag is the key and the value is the zip/value. How can I build a code to grab only the attribute names for which a zip value exists and disregard ones which do not have a zip value and only have the nation value.

My code:

   import urllib2
   import xml.etree.ElementTree as ET
   tree = ET.parse(urllib2.urlopen("http://www.sample_xml.com"))
   # creating list of names
   names = node.text for node in tree.findall('.//attribute/name')]
   zip_values = [node.text for node in tree.findall('.//zip/value')]

From here I would combine the two lists into a dictionary. But right now the lists I am getting look like this and there is a mismatch of Keys to values:

   names = ('Index', 'Value_1', 'Value_2')
   zip_values = ('323800', '1000')

Really what I need is

   my_dict = ['Index':'323800', 'Value_2':'1000']

But what I get with my code is below. Is there a way to workaround this?

   my_dict = ['Index':'323800', 'Value_1':'1000', 'Value_2:'Na']

eLRuLL · Accepted Answer · 2017-03-01 00:41:01Z

1

import urllib2
from lxml import etree
root = etree.fromstring(urllib2.urlopen("http://www.sample_xml.com").read())
# creating list of names

d = {}
for attribute_node in root.xpath('//attribute[./values/zip/value]'):
     d[attribute_node.xpath('./name')[0].text] = attribute_node.xpath('./values/zip/value')[0].text

print d # {'Index':'323800', 'Value_2':'1000'}

answered Mar 1, 2017 at 0:41

eLRuLL

18.8k9 gold badges79 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Planet_Z Over a year ago

Could you please explain how exactly this code works and what is the logic behind it? Would be very helpful. Thanks!

eLRuLL Over a year ago

everything is on the xpath query, where it only gets attributes that later contain a ./values/zip/value.

Collectives™ on Stack Overflow

xml scraping with python with odd xml structure

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related