Missing attributes in xml when using python

Question

I am trying to read this url and trying to extract the information between this tag: "identificationInfo"

However, when I use this code:

import requests
import xml.etree.ElementTree as ET

url = "http://qldspatial.information.qld.gov.au/catalogue/rest/document?id={96BD66CE-2207-4D35-815B-0E5648C0185F}&f=xml"

response = requests.get(url)

xml_content = response.content

tree = ET.fromstring(xml_content)

for child in tree:

    print(child.tag, child.attrib)

but the results I get back don't contain any attributes for the tags.

('{http://www.isotc211.org/2005/gmd}fileIdentifier', {})
('{http://www.isotc211.org/2005/gmd}language', {})
('{http://www.isotc211.org/2005/gmd}characterSet', {})
('{http://www.isotc211.org/2005/gmd}parentIdentifier', {})
('{http://www.isotc211.org/2005/gmd}hierarchyLevel', {})
('{http://www.isotc211.org/2005/gmd}contact', {})
('{http://www.isotc211.org/2005/gmd}dateStamp', {})
('{http://www.isotc211.org/2005/gmd}metadataStandardName', {})
('{http://www.isotc211.org/2005/gmd}metadataStandardVersion', {})
('{http://www.isotc211.org/2005/gmd}referenceSystemInfo', {})
('{http://www.isotc211.org/2005/gmd}identificationInfo', {})
('{http://www.isotc211.org/2005/gmd}distributionInfo', {})
('{http://www.isotc211.org/2005/gmd}dataQualityInfo', {})
('{http://www.isotc211.org/2005/gmd}metadataConstraints', {})`

I am not familiar with xml, and I can't work out why I can't see any more information. Am I missing a step? If someone could assist, it would greatly be appreciated.

What exactly you want to get? Just a text? The 72 Fish Habitat Areas in this dataset are declared under Section 120 - Fisheries Act-1994 and Schedule 3-Queensland Fisheries Regulations 2008, effective 30 September 2016. This is a composite of ALL Fish Habitat Area boundary...? Or xml tree of identificationInfo? — Andersson
– Andersson, Commented Feb 7, 2017 at 8:46
I am really only after the text from the tags from the <identificationInfo></identificationInfo> tree, but I would be happy to be able to print out the xml tree of the identificationInfo, but at the moment I am not getting very far with either. — TsvGis
– TsvGis, Commented Feb 7, 2017 at 20:58
Actually, the only information that I need from the <identificationInfo></identificationInfo> tree is: <date> <gco:Date>2014-09-05</gco:Date> </date> and <CI_ResponsibleParty id="resourceOwner"> <organisationName> <gco:CharacterString>Department of National Parks, Sport and Racing</gco:CharacterString> </organisationName>. I need to pull this information from other xmls of the same structure so I am trying to automate this. — TsvGis
– TsvGis, Commented Feb 7, 2017 at 21:07

Andersson · Accepted Answer · 2017-02-09 09:34:37Z

1

I'm using minidom instead of ElementTree. The code to get required values is:

from xml.dom import minidom
import requests

url = "http://qldspatial.information.qld.gov.au/catalogue/rest/document?id={96BD66CE-2207-4D35-815B-0E5648C0185F}&f=xml"

response = requests.get(url)
xml_content = response.content
doc = minidom.parseString(xml_content)
identification = doc.getElementsByTagName("identificationInfo")[0]
date = identification.getElementsByTagName('gco:Date')[0].firstChild.nodeValue # "2014-09-05"
responsible_party = identification.getElementsByTagName('CI_ResponsibleParty')[0]
department = responsible_party.getElementsByTagName('gco:CharacterString')[0].firstChild.nodeValue # "Department of National Parks, Sport and Racing"

answered Feb 9, 2017 at 9:34

Andersson

52.8k18 gold badges83 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

TsvGis Over a year ago

Your Awsome!! That works great!! I was about to use Beautiful Soup to do this as I had success with it last night, but this solves my problem of having to introduce a third party package.

Collectives™ on Stack Overflow

Missing attributes in xml when using python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related