1

I am trying to read this url and trying to extract the information between this tag: "identificationInfo"

However, when I use this code:

import requests
import xml.etree.ElementTree as ET

url = "http://qldspatial.information.qld.gov.au/catalogue/rest/document?id={96BD66CE-2207-4D35-815B-0E5648C0185F}&f=xml"

response = requests.get(url)

xml_content = response.content

tree = ET.fromstring(xml_content)

for child in tree:

    print(child.tag, child.attrib)

but the results I get back don't contain any attributes for the tags.

('{http://www.isotc211.org/2005/gmd}fileIdentifier', {})
('{http://www.isotc211.org/2005/gmd}language', {})
('{http://www.isotc211.org/2005/gmd}characterSet', {})
('{http://www.isotc211.org/2005/gmd}parentIdentifier', {})
('{http://www.isotc211.org/2005/gmd}hierarchyLevel', {})
('{http://www.isotc211.org/2005/gmd}contact', {})
('{http://www.isotc211.org/2005/gmd}dateStamp', {})
('{http://www.isotc211.org/2005/gmd}metadataStandardName', {})
('{http://www.isotc211.org/2005/gmd}metadataStandardVersion', {})
('{http://www.isotc211.org/2005/gmd}referenceSystemInfo', {})
('{http://www.isotc211.org/2005/gmd}identificationInfo', {})
('{http://www.isotc211.org/2005/gmd}distributionInfo', {})
('{http://www.isotc211.org/2005/gmd}dataQualityInfo', {})
('{http://www.isotc211.org/2005/gmd}metadataConstraints', {})`

I am not familiar with xml, and I can't work out why I can't see any more information. Am I missing a step? If someone could assist, it would greatly be appreciated.

3
  • What exactly you want to get? Just a text? The 72 Fish Habitat Areas in this dataset are declared under Section 120 - Fisheries Act-1994 and Schedule 3-Queensland Fisheries Regulations 2008, effective 30 September 2016. This is a composite of ALL Fish Habitat Area boundary...? Or xml tree of identificationInfo? Commented Feb 7, 2017 at 8:46
  • I am really only after the text from the tags from the <identificationInfo></identificationInfo> tree, but I would be happy to be able to print out the xml tree of the identificationInfo, but at the moment I am not getting very far with either. Commented Feb 7, 2017 at 20:58
  • Actually, the only information that I need from the <identificationInfo></identificationInfo> tree is: <date> <gco:Date>2014-09-05</gco:Date> </date> and <CI_ResponsibleParty id="resourceOwner"> <organisationName> <gco:CharacterString>Department of National Parks, Sport and Racing</gco:CharacterString> </organisationName>. I need to pull this information from other xmls of the same structure so I am trying to automate this. Commented Feb 7, 2017 at 21:07

1 Answer 1

1

I'm using minidom instead of ElementTree. The code to get required values is:

from xml.dom import minidom
import requests

url = "http://qldspatial.information.qld.gov.au/catalogue/rest/document?id={96BD66CE-2207-4D35-815B-0E5648C0185F}&f=xml"

response = requests.get(url)
xml_content = response.content
doc = minidom.parseString(xml_content)
identification = doc.getElementsByTagName("identificationInfo")[0]
date = identification.getElementsByTagName('gco:Date')[0].firstChild.nodeValue # "2014-09-05"
responsible_party = identification.getElementsByTagName('CI_ResponsibleParty')[0]
department = responsible_party.getElementsByTagName('gco:CharacterString')[0].firstChild.nodeValue # "Department of National Parks, Sport and Racing"
Sign up to request clarification or add additional context in comments.

1 Comment

Your Awsome!! That works great!! I was about to use Beautiful Soup to do this as I had success with it last night, but this solves my problem of having to introduce a third party package.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.