12

Goal: Get the values inside <Name> tags and print them out. Simplified XML below.

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <soap:Body>
      <GetStartEndPointResponse xmlns="http://www.etis.fskab.se/v1.0/ETISws">
         <GetStartEndPointResult>
            <Code>0</Code>
            <Message />
            <StartPoints>
               <Point>
                  <Id>545</Id>
                  <Name>Get Me</Name>
                  <Type>sometype</Type>
                  <X>333</X>
                  <Y>222</Y>
               </Point>
               <Point>
                  <Id>634</Id>
                  <Name>Get me too</Name>
                  <Type>sometype</Type>
                  <X>555</X>
                  <Y>777</Y>
               </Point>
            </StartPoints>
         </GetStartEndPointResult>
      </GetStartEndPointResponse>
   </soap:Body>
</soap:Envelope>

Attempt:

import requests
from xml.etree import ElementTree

response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')

# XML parsing here
dom = ElementTree.fromstring(response.text)
names = dom.findall('*/Name')
for name in names:
    print(name.text)

I have read other people recommending zeep to parse soap xml but I found it hard to get my head around.

4
  • The XML document you have posted above is invalid Commented Jul 22, 2017 at 4:25
  • @danielcorin does it work now? Commented Jul 22, 2017 at 4:28
  • 1
    It is still malformed. A quick Google search will help you find tools you can use to validate XML Commented Jul 22, 2017 at 4:32
  • @danielcorin I added the API link. The whole response can bee seen by loading that link labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst Commented Jul 22, 2017 at 4:35

5 Answers 5

28

The issue here is dealing with the XML namespaces:

import requests
from xml.etree import ElementTree

response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')

# define namespace mappings to use as shorthand below
namespaces = {
    'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
    'a': 'http://www.etis.fskab.se/v1.0/ETISws',
}
dom = ElementTree.fromstring(response.content)

# reference the namespace mappings here by `<name>:`
names = dom.findall(
    './soap:Body'
    '/a:GetStartEndPointResponse'
    '/a:GetStartEndPointResult'
    '/a:StartPoints'
    '/a:Point'
    '/a:Name',
    namespaces,
)
for name in names:
    print(name.text)

The namespaces come from the xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" and xmlns="http://www.etis.fskab.se/v1.0/ETISws" attributes on the Envelope and GetStartEndPointResponse nodes respectively.

Keep in mind, a namespace is inherited by all children nodes of a parent even if the namespace isn't explicitly specified on the child's tag as <namespace:tag>.

Note: I had to use response.content rather than response.body.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your answer! Where would I get the two links in namespaces dictionary if I were to repeat this on a different XML document?
@Clone you can get a dict of namespaces via dom.nsmap
10

An old question but worth to mention another option for this task.

I like to use xmltodict (Github) a lightweight converter of XML to python dictionary.

Take your soap response in a variable named stack

Parse it with xmltodict.parse

In [48]: stack_d = xmltodict.parse(stack)

Check the result:

In [49]: stack_d
Out[49]:
OrderedDict([('soap:Envelope',
            OrderedDict([('@xmlns:soap',
                            'http://schemas.xmlsoap.org/soap/envelope/'),
                        ('@xmlns:xsd', 'http://www.w3.org/2001/XMLSchema'),
                        ('@xmlns:xsi',
                            'http://www.w3.org/2001/XMLSchema-instance'),
                        ('soap:Body',
                            OrderedDict([('GetStartEndPointResponse',
                                        OrderedDict([('@xmlns',
                                                        'http://www.etis.fskab.se/v1.0/ETISws'),
                                                    ('GetStartEndPointResult',
                                                        OrderedDict([('Code',
                                                                    '0'),
                                                                    ('Message',
                                                                    None),
                                                                    ('StartPoints',
                                                                    OrderedDict([('Point',
                                                                                    [OrderedDict([('Id',
                                                                                                '545'),
                                                                                                ('Name',
                                                                                                'Get Me'),
                                                                                                ('Type',
                                                                                                'sometype'),
                                                                                                ('X',
                                                                                                '333'),
                                                                                                ('Y',
                                                                                                '222')]),
                                                                                    OrderedDict([('Id',
                                                                                                '634'),
                                                                                                ('Name',
                                                                                                'Get me too'),
                                                                                                ('Type',
                                                                                                'sometype'),
                                                                                                ('X',
                                                                                                '555'),
                                                                                                ('Y',
                                                                                                '777')])])]))]))]))]))]))])

At this point it become as easy as to browse a python dictionnary

In [50]: stack_d['soap:Envelope']['soap:Body']['GetStartEndPointResponse']['GetStartEndPointResult']['StartPoints']['Point']
Out[50]:
[OrderedDict([('Id', '545'),
            ('Name', 'Get Me'),
            ('Type', 'sometype'),
            ('X', '333'),
            ('Y', '222')]),
OrderedDict([('Id', '634'),
            ('Name', 'Get me too'),
            ('Type', 'sometype'),
            ('X', '555'),
            ('Y', '777')])]

Comments

2

Again, replying to an old question but I think this solution is worth sharing. Using BeautifulSoup was piece of cake for me. You can install BeautifulSoup form here.

from bs4 import BeautifulSoup
xml = BeautifulSoup(xml_string, 'xml')
xml.find('soap:Body') # to get the soup:Body tag. 
xml.find('X') # for X tag

Comments

1

try like this

import requests
from bs4 import BeautifulSoup
    
response = requests.get('http://www.labs.skanetrafiken.se/v2.2/querystation.asp?inpPointfr=yst')
    
xml = BeautifulSoup(response.text, 'xml')
xml.find('soap:Body')  # to get the soup:Body tag.
xml.find('X')  # for X tag

Comments

0

Just replace all the 'soap:' and other namespace prefixes such as 'a:' with '' (just remove them an make it a non-SOAP xml file)

new_response = response.text.replace('soap:', '').replace('a:', '')

Then you can just proceed normally.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.