0

I am new to Python & trying to extract XML attributes. Below is the code that I tried.

import xml.etree.ElementTree as ET

a = '''<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <checkVatResponse xmlns="urn:ec.europa.eu:taxud:vies:services:checkVat:types">
         <countryCode>RO</countryCode>
         <vatNumber>43097749</vatNumber>
         <requestDate>2022-07-12+02:00</requestDate>
         <valid>true</valid>
         <name>ROHLIG SUUS LOGISTICS ROMANIA S.R.L.</name>
         <address>MUNICIPIUL  BUCUREŞTI, SECTOR 1
BLD. ION MIHALACHE Nr. 15-17
Et. 1</address>
      </checkVatResponse>
   </soap:Body>
</soap:Envelope>'''
tree = ET.ElementTree(ET.fromstring(a))
root = tree.getroot()

for cust in root.findall('Body/checkVatResponse'):
    name = cust.find('name').text
    print(name)

I wanted to extract 'name' and 'address' from XML. But when I run the above code nothing is printed. What is my mistake?

Regards, Mayank Pande

1 Answer 1

2

Namespaces dawg, namespaces! You can be damn sure that when Jay-Z rapped about having 99 problems, having to deal with XML with namespaces was definitely one of them!

See Parsing XML with Namespaces

For the body tag, its namespace is http://schemas.xmlsoap.org/soap/envelope/, checkVatResponse's is urn:ec.europa.eu:taxud:vies:services:checkVat:types, and both name and address's are urn:ec.europa.eu:taxud:vies:services:checkVat:types, which they inherit off their parent, checkVatResponse.

So, you can explicitly search for an element including its namespace, like so:

root.findall('{http://schemas.xmlsoap.org/soap/envelope/}Body/{urn:ec.europa.eu:taxud:vies:services:checkVat:types}checkVatResponse')

Or you can ignore it with the wildcard character:

root.findall('{*}Body/{*}checkVatResponse')

Try this:

a = '''<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <checkVatResponse xmlns="urn:ec.europa.eu:taxud:vies:services:checkVat:types">
         <countryCode>RO</countryCode>
         <vatNumber>43097749</vatNumber>
         <requestDate>2022-07-12+02:00</requestDate>
         <valid>true</valid>
         <name>ROHLIG SUUS LOGISTICS ROMANIA S.R.L.</name>
         <address>MUNICIPIUL  BUCUREŞTI, SECTOR 1
BLD. ION MIHALACHE Nr. 15-17
Et. 1</address>
      </checkVatResponse>
   </soap:Body>
</soap:Envelope>'''
tree = ET.ElementTree(ET.fromstring(a))
root = tree.getroot()

for cust in root.findall('{*}Body/{*}checkVatResponse'):
    name = cust.find('{*}name').text
    print(name)
    address = cust.find('{*}address').text
    print(address)

Output:

ROHLIG SUUS LOGISTICS ROMANIA S.R.L.
MUNICIPIUL  BUCUREŞTI, SECTOR 1
BLD. ION MIHALACHE Nr. 15-17
Et. 1
Sign up to request clarification or add additional context in comments.

3 Comments

//*[local-name()="Body"]/*[local-name()="checkVatResponse"] works also without using namespaces at the cost of a weird syntax
local-name() works with lxml, but not with the built-in ElementTree library (which is used in the question).
Thank you so much guys. This worked. I will go thorough XML namespaces & try to learn more.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.