1
<?xml-stylesheet href="/Style Library/st/xslt/rss2.xsl" type="text/xsl" media="screen" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:ta="http://www.smartraveller.gov.au/schema/rss/travel_advisories/" xmlns:dc="http://purl.org/dc/elements/1.1/">

  <channel>
    <title>Travel Advisories</title>
    <link>http://smartraveller.gov.au/countries/</link>
    <description>the Australian Department of Foreign Affairs and Trade's Smartraveller advisory service</description>
    <language>en</language>
    <webMaster>[email protected]</webMaster>
    <copyright>Copyright Commonwealth of Australia 2011</copyright>
    <ttl>60</ttl>
    <atom:link href="http://smartraveller.gov.au/countries/Documents/index.rss" rel="self" type="application/rss+xml" />
    <generator>zcms</generator>
    <image>
      <title>Advice</title>
      <link>http://smartraveller.gov.au/countries/</link>
      <url>/Style Library/st/images/dfat_logo_small.gif</url>
    </image>
    <item>
      <title>Czech Republic</title>
      <description>This travel advice has been reviewed. The level of our advice has not changed. Exercise normal safety precautions in the Czech Republic.</description>
      <link>http://smartraveller.gov.au/Countries/europe/eastern/Pages/czech_republic.aspx</link>
      <pubDate>26 Oct 2018 05:25:14 GMT</pubDate>
      <guid isPermaLink="false">cdbcc3d4-3a89-4768-ac1d-0221f8c99227 GMT</guid>
      <ta:warnings>
        <dc:coverage>Czech Republic</dc:coverage>
        <ta:level>2/5</ta:level>
        <dc:description>Exercise normal safety precautions</dc:description>
      </ta:warnings>
  </item>

I want to extract the value of <ta:level> under <warning> for each item I have. I alreay tried existing online solutions but nothing works for me. Basically, my xml contains multiple namespaces.

req = requests.request('GET', "https://smartraveller.gov.au/countries/documents/index.rss")
a = str(req.text).encode()
tree = etree.fromstring(a)

ns = {'TravelAd': 'https://smartraveller.gov.au/countries/documents/index.rss',
          'ta': 'http://www.smartraveller.gov.au/schema/rss/travel_advisories/'}

    e = tree.findall('{0}channel/{0}item/{0}warnings/{0}level'.format(ns))
    for i in e:
        print(i.text)
1
  • So what's the error? Commented Mar 5, 2019 at 12:22

1 Answer 1

0

The XML has multiple namespaces, but the only namespace you need to worry about is http://www.smartraveller.gov.au/schema/rss/travel_advisories/.

This is because the only elements in the path to your target that are in a namespace are ta:level and ta:warning.

Example...

from lxml import etree
import requests

req = requests.request('GET', "https://smartraveller.gov.au/countries/documents/index.rss")
a = str(req.text).encode()

tree = etree.fromstring(a)

ns = {'ta': 'http://www.smartraveller.gov.au/schema/rss/travel_advisories/'}

e = tree.findall('channel/item/ta:warnings/ta:level', ns)
for i in e:
    print(i.text)

prints...

2/5
2/5
4/5
2/5
...and so on

If you wanted a list, consider switching from findall() to xpath()...

e = tree.xpath('channel/item/ta:warnings/ta:level/text()', namespaces=ns)
print(e)

prints...

['2/5', '2/5', '4/5', '2/5', and so on...]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.